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THE EFFECT OF PRACTICE ON TEST 
INTERCORRELATIONS 


HERBERT WOODROW 
University of Illinois 


Four groups of subjects, varying from forty-nine to eighty-two in 
number, were given practice in from four to seven tests. It is the 
purpose of this paper to present the abundant data thus obtained 
concerning the effect of practice on test intercorrelations and to point 
out the conclusions which are thereby justified. One of the more 
interesting findings is that, if the reliability of the tests is kept con- 
stant, it cannot be regarded as in the least unusual for practice in each 
of two tests to cause a decrease in their intercorrelation. Of the forty- 
seven correlations, on which the effect of. practice has been deter- 
mined, no less than twenty-two were lower at the end of practice than 
they were initially. Data have also been obtained concerning the 
change in the correlations shown by a practiced test with Otis intelli- 
gence tests, given before and after practice. It was found that quite 
often the effect of practice was to lower the correlation of the practiced 
tests with the intelligence tests. In fact, in the case of all four groups, 
the correlations with the intelligence tests shown by final practice- 
test scores were on the average lower than those shown by initial test 
scores. 

Of the four experiments completed, which will be designated by 
the letters A, B, C, and D, two involved what may be termed long 
practice and two short practice. In each experiment all of the practice 
tests used with the group in question were given at each sitting, as 
group tests, and sittings occurred one per day each day of the week 
except Saturday and Sunday. In the case of both the practice tests 
and the unpracticed end tests, different forms were used at successive 
trials, except in the case of speed of making gates. In the practice 
tests, including the modified Philip’s cancellation test, ten different 


forms were used in rotation. In the digit-letter code test, the change 
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from one trial to another consisted simply in a change in the code and 
in the digit cancellation tests, in the particular digits which were to 
be cancelled. All the groups were composed of university students, 
chiefly freshmen and sophomores. The four experiments may be 
characterized as follows, in terms of the number of subjects, the tests 
used; and the amount of practice: 

Experiment A.—Number of subjects, eighty-two. Practice given 
in four tests: Horizontal adding; digit-letter code; two-letter cancella- 
tion; and four-letter cancellation. Practice continued for sixty-six 
trials. Length of test at each trial: Horizontal adding, ten minutes; 
code test, ten minutes; two-letter and four-letter cancellation, three 
minutes each. Unpracticed tests: The complete Otis Advanced 
Intelligence Examination, form A given before practice and form B 
after practice. 

Experiment B.—Number of subjects, fifty-six. Practice covered 
thirty-nine hour-and-a-half sittings, at each of which practice was 
given in seven tests, as follows: Horizontal adding, ten minutes; 
digit-letter code, ten minutes; spot-pattern test, modified, twenty- 
four minutes (only about two-thirds of which was occupied by the 
subject’s performance’); anagrams, ten minutes; a modified Philip’s 
multiple instruction letter-cancellation test, ten minutes; estimation of 
relation between two lengths, one hundred trials, requiring fifteen 
minutes; speed of making gates, ten minutes, with one minute pause 
after each two minutes of practice. Six of the Otis tests; namely, 
directions, proverbs, arithmetic, geometric figures, similarities, and 
narrative completion, were also given, form A of these tests being used 
before practice and form B after practice. 

Experiment C.—Number of subjects, sixty-five. Partly because of 
the difficulty of compiling a sufficient number of different forms in the 
case of analogies, there were only ten practice sittings. At each sitting, 
practice was given in one long list of analogies, average time per list, 
five minutes; one anagrams test, consisting of the rearrangement of 
letters to make lists of ten words of each of a number of specified 
categories, such as ‘‘astronomical bodies,”’ and consuming an average 
time of seven and one-half minutes; and two three-digit cancellation 
tests, lasting three minutes each, or six minutes per sitting. No intelli- 
gence tests were given this group. 





1A detailed description of all the tests used in this experiment is given in a 
paper entitled, ‘‘The Relation between Abilities and Improvement with Practice,” 
J. Ed. Psychol., 1938, xx1x, pp. 215-230. 
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Experiment D.—Number of subjects, forty-nine. At each of ten 
sittings, the following practice was given: Analogies, five minutes; 
rearranging words to make sentences, five minutes; rearranging 
syllables to make words, five minutes; horizontal adding, seven min- 
utes; three-digit cancellation, two three-minute periods. The follow- 
ing seven Otis tests were also given: Directions, opposites, proverbs, 
arithmetic, geometric figures, similarities, and narrative completion. 
The A forms were given before practice and the B forms after practice. 

Short fore-practice of about two minutes was given in the case of 
all the tests, in all four experiments. Moreover, in all experiments, 
several tests, the scores of which were not used, were given before the 
experiment proper, merely to accustom the subjects to group-test 
procedure. 

Practice leads as a rule to greater reliability of score, and this 
increase in reliability of itself will of course be an influence towards 
increase in the correlations. On the other hand, according to a com- 
monly accepted concept, the true correlation is the degree of correla- 
tion that would be obtained with perfectly reliable tests. This effect of 
practice upon the true test-intercorrelations appears to constitute the 
more significant problem. Accordingly, procedures were adopted in all 
cases that would prevent the results from being appreciably affected by 
the change with practice in test-reliability. In experiments C and D, 
dealing with short practice, initial and final scores, except in the case 
of the analogies test, were taken as the means of the two initial and 
final trials, respectively, and the intercorrelations of the practice tests 
were then corrected for attenuation by using the Spearman-Brown 
reliability coefficients. In the case of analogies, only the initial and 
final single trials were used, and the reliability was calculated from 
the correlation of the scores on the odd and even items of the total 
of sixty items comprising the test. In the case of experiments A and 
B, the procedure was adopted of giving high and approximately equal 
reliabilities to the initial and final scores of each test by averaging 
together that number of successive trials required to yield a Spearman- 
Brown reliability coefficient of about +.94 or +.95. When both 
initial and final correlations are calculated for scores of equal relia- 
bility, the correlations, it is true, are not as high as when corrected for 
attenuation; but, when all tests have the same initial and final reli- 
ability, the relative magnitude of initial and final intercorrelations is 
obviously the same as when both initial and final correlations are 
corrected for attenuation. Consequently in experiments A and B 
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correction for attenuation is regarded as unnecessary, though it could 
be readily made, if desired, from the data given in Table I. 


TaBLE I.—ErFrect or PRACTICE UPON TEST INTERCORRELATIONS 






































(Experiment A) 
Initial Intercorrelations 
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Final Intercorrelations 
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Number of subjects, 82. 
Increases in r with practice, 5; decreases, 1. 


In addition to affecting reliability, practice ordinarily results 
in a change in the dispersion of scores—typically, an increase in the 
case of units of amount done, and a decrease in the case of error scores. 
The effect of change in dispersion upon reliability, and thus upon 
correlation, if any, is sufficiently eliminated when the correlations are 
either corrected for attenuation as in experiments C and D, or, as in 
experiments A and B, calculated from test scores of approximately the 
same reliability as judged by the Spearman-Brown formula. 
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Table I shows the initial and final practice-test intercorrelations, 


obtained from experiments A, B, C, and D, respectively. 


In the case 


of C and D, the coefficients corrected for attenuation are given in italics 
immediately below the corresponding uncorrected coefficients. The 


TaBLeE [.—Continued 


(Experiment B) 
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Number of subjects, 56. 
Scores on test 4 are error scores. 


Scores on test 3 are the o value, in a normal distribution, of the percentage 
correct. 


Increases in r with practice, 6 (one a negative increase); decreases, 15. 
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TaBLeE I.—Continued 
(Experiment C) 








Initial Intercorrelations 
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Number of subjects, 65. 
Under each raw coefficient, the same coefficient after correction for attenuation 
is given in italics. 
The categories test consisted of rearranging letters to make words of specified 
categories. 
Increases in r with practice, 9; decreases, 1. 
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TaBLe I.—Continued 
(Experiment D) 
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Number of subjects, 49. 


Test 2 consisted of rearranging words to make sentences; test 3, of rearranging 
syllables to make words. 


Increases in r with practice, 5; decreases, also 5. 
Number of increases with practice in entire table, 25; decreases, 22. 
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table also shows, in the case of every test, the Spearman-Brown 
reliability coefficient, the mean, and cais. 

Table I reveals that the four experiments taken together give a 
much more adequate picture of the effect of practice upon test inter- 
correlations than would any one of them. Thus, of the two long prac- 
tice experiments, A gave results which suggest that practice ordinarily 
produces an increase in intercorrelation while B yielded data showing a 
decrease to be the more usual occurrence. Likewise, the data obtained 
by experiment C show a great predominance of increases while the 
data yielded by D show increases and decreases of exactly equal fre- 
quency. Variation in the outcome of the four experiments is probably 
due mainly to the use in each case of a completely different group of 
subjects and largely different sets of practice tests. The amount of 
time devoted to practice may also be afactor. It seems quite possible, 
too, that the nature of the whole set of tests practiced may influence 
the way in which the subjects attain their gains in single tests. If this 
be true, the effect of practice upon the intercorrelation of a given 
pair of tests may to some extent depend upon the nature of the other 
tests practiced at the same sittings. 

In spite of the considerable variation in the results of the four 
experiments, there was little variation in the effect of practice upon 
the direction of change in the correlation of a given pair of tests. 
The only pair of tests used in both of the long practice experiments, 
A and B, were horizontal adding and letter-digit substitution; and 
this pair showed an increase with practice in their intercorrelation, 
from .267 to .467 in A and from .312 to .375 in B. In the case of 
experiments C and D, the three common instances are the following: 
Horizontal adding and three-digit cancellation, which showed a 
decrease in intercorrelation from .557 to .503 in C and from .359 to 
.257 in D; horizontal adding and analogies, which changed from .350 
to .364 in C and from —.053 to +.082 in D; and analogies and three- 
digit cancellation, which changed from .161 to .304 in C and from 
— .073 to .365in D. The one inconsistency in the direction of change 
in test intercorrelation was in the case of horizontal adding and digit 
cancellation. While the correlation between horizontal adding 
and three-digit cancellation decreased with practice in both C and D, 
the correlation between horizontal adding and either two-digit or 
four-digit cancellation showed a marked increase in the long practice 
experiment A. 

The data of Table I establish beyond doubt that, with reliability 
held constant, practice may result in no significant change or in either 
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an increase or a decrease in test intercorrelation. There appears to 
be no simple way, at present, of predicting whether the correlation 
between two tests will increase or decrease with practice in both tests. 
The change in intercorrelation with practice does not depend to any 
conspicuous degree upon the extent to which the two tests initially 
involve the same factors, that is, upon whether the initial correlation 
is high or low. The outcome can be determined, apparently, only by 
experiment in each particular case. 

Particularly interesting is the effect of practice on the correlation 
of the practiced test with intelligence tests. The results obtained on 
this question are shown in Table II. In the case of experiments 
A, B, and D, Otis intelligence tests were given in two forms, one form 
being used before practice and the other after practice. Any effect 
upon correlation due to a difference in reliability of the two forms of 
the Otis tests were eliminated either by correlating the practice test 
scores with the amalgated score from the two Otis forms or by aver- 
aging the correlation between the two forms. For example, in B, 
initial horizontal adding score was correlated with the average score 
of twelve Otis tests, six of which belong to form A and six of which were 
the corresponding tests of form B. Final horizontal adding scores 
were correlated with this same average Otis score, 1.e., the average of 
the twelve scores. A similar procedure was used in the case of all the 
other practiced tests, and also in the case of experiment D. In the 
case of experiment A, both initial and final practice scores were corre- 
lated separately with total Otis A score and total Otis B score, and then 
the correlations with the two Otis forms were averaged. The practice 
test scores, as already pointed out, had about the same reliability 
initially as finally, or else, as in D, correlations therewith were corrected 
for attenuation. 

Table II shows that final scores, that is, the scores of the practiced 
tests at the end of practice, in the great majority of cases correlate 
less highly with Otis test scores than do the initial practice test scores. 
This conclusion, it should be remembered, applies only to correlations 
between equally reliable measures. Of the sixteen tests practiced by 
the subjects of experiments A, B, and D, there were five which showed 
an increase and eleven which showed a decrease in their correlation 
with Otis tests. All practice tests showing an initial correlation with 
the Otis tests of +.25 or over, of which there were eight, showed a 
decrease in correlation as the result of practice. These eight tests 
presumably include all the tests that anyone would regard as of 
possible value for measuring intelligence. The results obtained with 
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TaBLeE II.—CorRELATIONS BETWEEN UNPRACTICED TESTS AND INITIAL AND FINAL 


PRACTICE-TEST SCORES 
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All correlations corrected for unreliability of the practice-test scores. 
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two of the practice tests in experiment D, namely, analogies and 
rearranging words to make sentences, are particularly interesting, 
inasmuch as these two tests were simply longer and more reliable forms 
of two tests included in the complete Otis battery. As might be 
expected, they showed high initial correlations with the fourteen Otis 
tests used as end tests (seven of Form A and seven of Form B, these 
seven not including either analogies or rearranging words to make 
sentences). Since the correlations dropped with practice, the con- 
clusion indicated is that practice in certain intelligence tests decreases 
their true correlation with other, non-practiced intelligence tests. In 
general, the data demonstrate that, with reliability constant, initial 
test scores are apt to be better indicators of intelligence than scores 
yielded by well practiced tests. 

It is true that the decreases in correlation were often small. It 
should be remembered in this connection, however, that we are dealing 
here with a change in correlation in the case of a fixed group of subjects. 
The ordinary criterion of the significance of a difference between two 
correlation coefficients does not, therefore, apply in the present 
instance. The standard deviation of r in the case of a unique sample 
of subjects is simply that due to the fact that r will vary because of 
errors of response—the sort of errors considered in the correction for 
attenuation. This standard deviation, due solely to unreliability, 
is much smaller than that due to sampling of the population.’ With 
the high reliabilities which characterized the present study, a change 
of ten or fifteen hundredths in the correlation coefficient may be 
regarded as significant. 





1 A formula for such a standard deviation has been derived, at the suggestion 
of Professor A. R. Crathorne, of the department of mathematics of the University 
of Illinois, from a common formula for correction for attenuation, namely, 


TAB 
T SS — 
wo 
~V raa/ rep 


in which r,, is the corrected coefficient and rag the uncorrected one, between the 
two tests A and B. Transposing, 


\ 
TAB = ToN/TAAYV TBB 





in which r,, being the true correlation, may be regarded as a value which in any 
given case remains constant, no matter what the reliability of the tests. Taking 
differentials, squaring, summing, and dividing by N, one obtains as the square of 
the coefficient of correlation due to unreliability, which may be designated or., 
the following formula: 
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The data presented above appear to constitute an adequate 
answer to the question originally rai ed by Hollingworth, whether 
intelligence would not be better measured by scores yielded by tests 
after long practice than by initial scores? Theansweris no. Holling- 
worth’s own data obtained from practice given thirteen subjects on 
unchanged test-forms were inadequate to answer the question he 
raised.! G.S. Gates, with a group of twenty-three subjects who were 
given from twenty-two to twenty-nine trials in five tests obtained data 
which are apparently interpreted as favoring a positive answer to 
the question.2, Gates, however, gave no intelligence tests and, while 
the conclusion stated is that even after correction for attenuation, 
the intercorrelations increase with practice, the table showing these 
coefficients? reveals decreases in four of the ten intercorrelations. 
Gundlach, on the other hand, who gave thirty-nine subjects twenty- 
five trials in three tests, concludes that there is no constant tendency 
for the correlations to increase with practice,‘ since, while two of his 
test intercorrelations showed an increase, the third intercorrelation 
remained about constant. The data presented above not only 
support Gundlach’s conclusion, but go further in showing also that 
decrease in test intercorrelation with practice is by no means an 
uncommon occurrence. They also show that the effect of practice 
may be to lessen the correlation between the test practiced and such 
intelligence tests as those included in the Otis battery, and, indeed, 
that a slight lowering of the correlation of the practiced test with 
intelligence, reliability being kept constant, is a rather usual result. 








4 


3 73 TaAe ras T BBO"; pg, 273, ares raaoree 
rn = + + 
TBB TAA. TaaT pp 


The value of r-,4r33 May be calculated by the Filon and Pearson equations if 
the proper data are available (see Kelly’s: Statistical Method, 1932, p. 179, formula 
128), or the whole term in which this value occurs may be neglected, since it will 
not seriously affect the value of ory. 

1 See Gundlach, R.: ‘The Effects of Practice on the Correlations of Three Men- 
tal Tests.” J. Educ. Psychol., Vol. xvu, 1926, pp. 388-391. 

2 Gates, G. S.: “Individual Differences as Affected by Practice.” Archts. 
of Psychol., No. 58, 1922, p. 36. 

* Op. cit., p. 33. 

4 Op. cit. 
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A STUDY OF THE OBSERVED RELATIONSHIP 
BETWEEN PERSISTENCE TEST RESULTS, 
INTELLIGENCE INDICES, AND ACADEMIC 

SUCCESS 


DAVID G. RYANS 
William Woods College, Fulton, Missouri 


The concept of education has undergone considerable expansion 
within recent years. Whereas the prevalent view a short while ago 
embraced little more than an information providing function, today 
it has been extended so as to include the equally important feature 
known as guidance. 

Undoubtedly guidance was often implied in the older ideas regard- 
ing the educational problem. It was assumed that the greater the 
knowledge, general and specific, a man possessed, the more efficient 
he would be in making the adjustments required by life. He was 
offered a widely diversified basic curriculum in school to acquaint 
him with the fundamental skills, the world in which he lived, and the 
objects or materials capable of aesthetic appreciation. Later, training 
was concentrated in a field of the student’s choice, or lot, and he 
became prepared ostensibly to gain his livelihood at some trade or 
profession. But no real effort was extended toward induction into 
the give-and-take, and highly specialized, adult life. 

Education, broadly conceived, today aims toward an understanding 
of the world and man, and, ultimately, at the adjustment of the 
individual to his worlds, both physical and social. Recognizing that, 
upon leaving the shelter of home and school, the individual is faced 
with a continuous struggle and competition, first, with the natural 
forces (a physical struggle for existence) and, second, with his fellow- 
men (a social and psychological struggle, i.e., competition within the 
group, trade, or profession), education can no longer offer a mere train- 
ing service, discharge at graduation time a mass of people widely 
separated in capacities and skills, and leave the all-important selective 
process to the future. If it does this and no more, education is not 
attaining its aim. 

But it is possible for education, working sincerely wholly to achieve 
its end, to obviate such circumstances and conditions. This is the 
purpose of testing and personnel services in the schools today—to 
attempt to foresee and predict success or failure; to cater more ade- 
quately to individual differences in the school; to prevent the troubles 
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resulting from misplacement; and, above all, to plant the seeds of 
adequate adjustment and make that adjustment a continuous process 
extending naturally into the individual’s occupational life. 

One of the problems of foremost importance faced by the educator, 
then, is that of student selection and guidance. Both in the secondary 
school and in the college the significance of behavior sampling and 
prediction therefrom is recognized as basic to the fulfillment of educa- 
tion’s goal, 7.e., the effective adjustment of the adult to his physical 
and social environments. While in the past education may have been 
satisfied to accept all comers, present them with certain factual mate- 
rials, and then thrust them upon the economic world, such a near- 
sighted view of school responsibility is no longer tenable. Adult life 
is seen in more nearly its true light, as a competitive activity, and the 
school is regarded as the agency through which the selective process 
can best operate. Positive guidance (the accurate prediction of 
success in a given educational or occupational endeavor) has not yet 
been accomplished successfully on any large scale. Negative guidance 
(the prediction of probable failure in certain lines of activity) is possible 
to a greater extent. The methods are admittedly far from perfect. 
With suitable tests of intelligence it has been possible to define rather 
roughly the limits of achievement in school. With tests of specialized 
aptitudes and abilities prediction has been approximated in certain 
of the trades. At present, however, it may be safely said that no 
measuring instrument, or combination of instruments, can actually 
be thought of as capable of foretelling individual success or failure 
efficiently. 

The inefficacy of educational prognosis undoubtedly has its basis 
in the complexity of causes contributing to academic success and the 
lack of instruments which will reliably indicate the presence or absence 
of requisite individual traits. One of the things* which has been an 
obstacle to successful prediction of school achievement is, as others have 
suggested,f the absence of quantitative estimates of persistence. 
While general and special aptitudes have been subjected to measure- 
ment, the capacity of persistence has, for the most part, evaded 
students and research workers in the field of behavior sampling. 





* Success in school work appears to be dependent upon at least five individual 
factors, these, perhaps, overlapping to an extent. They are: (1) Intelligence, (2) 
persistence, (3) motivation, or interest, (4) study habits, and (5) ability to recog- 
nize and select significant facts. 

t Stone,* Turney,® Ryans,?? e¢ al. 
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Several efforts have been made, however, to determine the degree of 
persistence possessed by an individual. In a recent attempt,'? the 
writer selected seventeen situations,* involving (a) continued effort 
in learning and problem-solving, (b) inhibition of reflexes and well- 
fixed habits, (c) endurance, or resistance to felt fatigue and boredom, 
and (d) resistance to extraneous stimuli, for study. It was presumed 
that such situations demanded persistent behavior on the part of a 
subject. Product moment correlation coefficients were computed for 
each measure with every other, and the resultant inter-correlations 
were subjected to multiple factor analysis according to the center- 
of-gravity method described by Thurstone. A group factor emerged 
which, for want of a better name, was referred to as a persistence 
factor. After consideration of the various measures in light of their 
factor loadings and of their adaptability for group administration, 
three were chosen to comprise a persistence test. This test measured 
the trait which we have termed persistence through specific situations 
requiring (1) consecutive effort at rational learning (informational 
materials), (2) self-ratings on a functional schedule, and (3) physical 
endurance (arm extension). The coefficient of correlation between 
test and retest for a small sample was 0.82. It was hoped, and sug- 
gested, that such an instrument might aid materially in educational 
selection. , 

The results which are reported here note the contributing value of 
the group persistence test to prediction in the school for two samples. 
Such data must be considered preliminary, but the worth of the test, 
from a practical standpoint, stands or falls on similar analyses. 

The two samples employed in studying the relation of persistence 
test performances to school marks and intelligence indices, (1) a 
group of forty junior-college sophomores and (2) another made up of 
ninety-two high-school juniors, were tested separately. The Army 
Alpha Examination, Revised Form 5, was administered to the high- 
school group. Typical intelligence scores based upon the equation 
(through the use of standard-deviation techniques) of estimates 





* Quantitative estimates of individual behavior were obtained for code- 
deciphering, anagrams (number), anagrams (time), study time, inhibition of 
free-association responses, inhibition of wink reflex, inhibition of patellar reflex, 
continuous mental work (addition), physical endurance (arm extension), resist- 
ance to distraction, negative transfer, study log, paired-comparison persistence 
ratings, graphic scale persistence ratings, honor-point ratio, a persistence schedule, 
and resistance to suggestion. 
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derived from the Army Alpha Examination and the Otis Self Adminis- 
tering Test, Higher Examination Form A, were obtained for the 
junior college sample. Honor-point ratios, determined by allotting 
three points for each hour-credit of A, two points for each hour-credit 
of B, one point for each hour-credit of C, no honor points for grades of 
D, and one negative point for each hour of F, and dividing the sum by 
the total number of hour credits, were employed to indicate college 
scholarship. These were converted into standard deviation scores 
with a mean of fifty. Average numerical marks served in a similar 
manner to show success in high school. ‘ 

The results for the two groups are stated separately. 

In Table I honor-point ratios and intelligence test performances 
are tabulated by quarters of the persistence score distribution for the 
junior-college group. 

TaBLe I.—Comparisons OF HonoR-POINT RATIOS AND INTELLIGENCE INDICES OF 


Forty JUNIOR-COLLEGE SOPHOMORES BY QUARTERS OF THEIR DISTRIBUTION 
oF PERSISTENCE TrEst ScorEs 











Honor-point ratio Typical intelligence score 
(standard scores) (standard scores) 
Per cent | Per cent Per cent | Per cent 
scores scores scores scores 
above Qs; | above Q: above Q; | below Q: 
Mean; 8D of total of total Mean| 6D of total of total 
distribu- | distribu- distribu- | distribu- 
tion* tiont tiont tion§ 
Highest twenty-five per 
cent of persistence scores | 57.5|10.6 60.0 10.0 48.8/10.8 30.0 10.0 
Second twenty-five per 
cent of persistence scores | 52.7) 8.7 40.0 20.0 62.1) 7.5 30.0 20.0 
Third twenty-five per cent 
of persistence scores..... 44.9) 4.2 00.0 30.0 47.6)10.1 20.0 40.0 
Lowest twenty-five per 
cent of persistence scores | 45.8) 7.0 00.0 40.0 §2.7|10.5 20.0 40.0 





























* Q; (honor-point ratios) = 57.5. 
¢ Q: (honor-point raios) = 43.4. 
t Qs (intelligence) = 56.5. 
§ Q: (intelligence) = 44.0. 


In general, it may be seen that division into subgroups based upon 
performance upon the persistence test shows corresponding differences 
in mean honor-point ratio. The mean for the highest quarter of the 
persistence distribution is 57.5 (standard score) as compared to that 
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of 45.8 for the lowest quarter. Sixty per cent of the upper fourth in 
persistence attained ranks in the upper fourth of the honor-point 
distribution. But no individual in the lowest quarter, nor in the next 
to the lowest quarter, on the persistence test demonstrated scholastic 
ability which would place him in the upper half of the group classified 
from the standpoint of honor-points. When the distribution of 
intelligence test scores is considered no such marked gradations with 
persistence are observable. Reliabilities of the differences between 
persistence groups in honor-point ratio and intelligence are given in 
Table II. 


TaBLE II.—RELIABILITIES OF THE DIFFERENCES BETWEEN MEaAns (HONOR-POINT 
Ratios AND INTELLIGENCE INDICES) or Groups CLASSIFIED ACCORDING TO 
THE QUARTER OF PERSISTENCE DistrRIBUTION Wuich THEy CoMPRISE 














(N = 40) 
D/SDaitt. 
Groups compared Honor-point ; 
: Intelligence 
ratio 
PE con crake.6% pehaneasahedes's c8ba56 1.1 0.8 
PS SERS SS SEs a et) eae eee es 3.5 0.3 
Se iid tO le kh eal a elaiet kdl he ties il 4 3.0 0.8 
Ns onan ce geke ha in ie oes hha eee eae 2.5 aos 
SE TR SES hy OE a eee 2.0 0.1 
ai a i ale ee a ka a oe hn 0.3 1.1 








* Highest twenty-five per cent in persistence. 
t Second twenty-five per cent in persistence. 
t Third twenty-five per cent in persistence. 

$ Lowest twenty-five per cent in persistence. 


A comparison of high-school marks and Army Alpha examination 
scores in relation to persistence is given for the high-school sample in 
Table III. Differences which are no less pronounced and, in general, 
similar to those obtained with the college group studied are apparent. 
The lowest fourth in persistence test performance attained an average 
mark of 82.5, and 8.7 per cent of its members did school work which 
placed them in the highest fourth of the distribution tabulated with 
respect to marks. On the other hand, the upper fourth in persistence 
showed an average school mark of 88.2 and 43.5 per cent of this group 
placed in the upper fourth of the grade distribution. With regard to 
intelligence, in this sample, the highest fourth in persistence also 
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seem to be represented by the highest Alpha scores. This, however, 
does not appear to be a consistent trend and it is not indicated by the 
three lower quarters of the distribution. 

The reliabilities of the differences between mean school marks and 
mean Alpha scores of groups classified according to quarters of the 
persistence test distribution which they make up are shown in Table IV. 


TaB_e II].—Comparisons or ScHoot Marks AND ArnMy ALPHA TEsT SCOREs oF 
NINETY-TWO HIGH-SCHOOL JUNIORS BY QUARTERS OF THEIR DISTRIBUTION 
OF PERSISTENCE Test ScoRES 











School marks Army Alpha Test scores 
Per cent | Per cent Per cent | Per cent 
scores scores scores scores 
above Q: | below Q: above Q; | below Q: 
Mean! 5D) cetotal | of total (me) 98?) ce total | of total 
distribu- | distribu- distribu- | distribu- 
tion* tiont tiont tion § 
Highest twenty-five per 
cent of persistence scores | 88.2) 4.6 43.5 13.0 123.3'20.8 39.2 13.0 
Second twenty-five per 
cent of persistence scores | 85.3) 5.7 26.1 21.1 116.2,'22.9 26.1 43.5 
Third twenty-five per cent 
of persistence scores..... 83.7| 5.3) 26.1 30.5 116.2)19.0 26.1 39.2 
Lowest twenty-five per 
cent of persistence scores | 82.5) 4.0 8.7 34.8 117.7|14.2 17.4 26.1 





























* Q; (school marks) = 89.3. 
t Q: (school marks) = 80.5. 
t Qs (Army Alpha) = 131.1. 
§ Q: (Army Alpha) = 103.0. 


The product-moment correlation coefficient between honor-point 
ratios and persistence test scores was calculated to be 0.48 for the 
junior-college group. Between intelligence and persistence the coeff- 
cient was —0.13, and between honor-point ratios and intelligence 
indices, 0.48. It is interesting to note that by combining persistence 
scores and the intelligence indices it is possible to predict honor-point 
ratios to the extent indicated by a multiple R of 0.73, which is sub- 
stantially higher than most coefficients which are obtained in predictive 
efforts. 

For the high-school group, the r between persistence and school 
marks was 0.38; between intelligence and persistence, 0.07; and between 
intelligence and school marks, 0.71 (the investigator’s efforts to account 
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for this unusually high correlation between ability and marks have 
been largely unrewarded).* Combining persistence and Army Alpha 
scores it was possible to raise the prediction coefficient only 0.08 
(to 0.79) in foretelling success in school as measured by course marks. 
Insofar as the zero-order coefficient between marks and intelligence 
was so high the effectiveness of the partial-multiple techniques was, of 
course, limited. 


Taste IV.—RELIABILITIES OF THE DIFFERENCES BETWEEN MEaNs (SCHOOL 
Marks AND Army ALPHA Scores) oF Groups RANKED ACCORDING TO 
QuARTER OF PERSISTENCE Test DistrispuTION Wuicu THEY Make Up 
(N = 92) 





D/SD itt.) 
Groups compared 








School marks Army Alpha scores 
SEY duis Ga haw 80eeee 4 rb ahs dae 1.9 1.1 
pO es eee ee re ea 3.1 1.2 
trek Kidadiacndeoes bance is 4.5 1.2 
a a a han a arian: gue ond 1.0 0.0 
a at aR le ia A Es 1.9 0.3 
Go cckGos curse un aaeeereaes 0.9 0.3 











* Highest twenty-five per cent persistence scores. 
t Second twenty-five per cent persistence scores. 
t Third twenty-five per cent persistence scores. 

§ Fourth twenty-five per cent persistence scores. 


SUMMARY 


The value of a valid test of persistence as a contributing predictive 
instrument in education is readily recognized. In this report a group 
persistence test, previously described, was observed in relation to 
school success, and, in general, it was found that while unrelated to 
intelligence, persistence test performances were positively related 
to assigned school marks. Multiple correlation coefficients of 0.73 and 
0.79 were obtained when combinations of intelligence test score and 
persistence test score were employed to predict honor-point ratio and 
school mark. It is tentatively suggested that the group persistence 
test may be a useful device for educational prognosis at the high-school 





* The fact that examinations in this school were principally of the short-answer 
sort, and that this type of an achievement test often correlates highly with intelli- 
gence tests, may account for the obtained r of 0.71. 
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and college levels. Caution must, however, be observed in the accept- 
ance of the results of this study. Only with extensive sampling under 
conditions other than those of the original analysis and research can 
such evidences be regarded as established. 
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THE TRUE-FALSE QUESTION AS AN AID IN STUDYING 


W. N. KELLOGG AND BRYAN PAYNE 


Indiana University 


INTRODUCTION TO THE PROBLEM 


The widespread use of true-false statements in current workbooks 
and study-manuals has raised a number of questions concerning the 
value of such statements as an aid in studying, particularly at the 
college level. How much does the reviewing of true-false statements 
by college students, in advance of a test, increase the students’ knowl- 
edge of the subject-matter of a course? Do students who study true- 
false questions tend in the main to “‘learn the questions’? Ordo they 
succeed in grasping the principles or points involved? To what 
extent does changing the wording of a true-false statement affect the 
student’s understanding of the point, providing the point was well 
understood in the beginning? These and kindred problems seem to be 
without clear-cut answers from the literature.' The present study 
was undertaken, therefore, to investigate some of the advantages of 
the true-false question, not as a means of examining or testing students, 
but as an aid to students in studying for subsequent examinations which 
might contain identical or similar questions. 


PROCEDURE 


Group I.—Two hundred and eighty-four freshmen and sophomore 
students (Group I) in a single section of general psychology were 
given practice in answering true-false questions. The students were 
instructed to guess at questions when not sure of the answer. The 
material was presented in mimeographed tests of fifty questions. 
The class met daily and the tests were given once each week. Each 
test included only the lecture and textbook material which had been 
covered since the last previous test. In scoring, the total number of 
questions right was taken as the score earned, with unanswered ques- 
tions counted as wrong. This method was preferred over the use of 
any special scoring formula. 





1 Cf. Kinney, L. B. and Eurich, A. C.: “A summary of investigations comparing 
different types of tests.”” School and Soc., Vol. xxxvi, 1932, pp. 540-544; ‘‘Studies 
of the true-false examination.” Psychol. Bull., Vol. xxx, 1933, pp. 305-317. 
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After three weekly tests had been given, the instructor announced 
that the fourth test in the series (hereafter referred to as Test A1) 
would be passed out a day earlier than usual, and that the students 
would be permitted to take it home and answer the questions at their 
leisure. No rules or restrictions of any sort were laid down as to the 
method of getting the correct answers. The Al papers were returned 
by the group the following day and were collected at the beginning of 
the class hour. To convey the impression that the incident was then 
closed, the instructor began to lecture as usual. Before the end of the 
period, however, instruction stopped and a second or “‘surprise”’ test 
(A2), which was identical in every respect with Al, was distributed. 
Test A2 was taken by Group I under strict examination conditions, 
with no outside aids of any sort. As a means of furnishing added 
motivation, the members of the class were told that Test A2 would 
count towards the semester grade while Test Al would not. 

Group II.—To control the possibility that the scores obtained on 
Test A2 may have been influenced by verbatim memorizing of the 
true-false statements, or by a knowledge of the serial order or arrange- 
ment of the separate true and false answers, a second group (Group II) 
was tested as follows: 

A new series of true-false statements (Test B)—matched question 
for question with Al and A2—was prepared. Each B question dealt 
with a point which had previously been made the basis of an A ques- 
tion. But the wording of the B questions was entirely different from 
the wording of the A questions, and their serial order was randomly 
varied from the A order. Questions of the B test, in the opinion of 
two experienced judges, were equivalent in content and difficulty to 
the A questions, but each B question was distinct in form from its 
corresponding A question. It was supposed by the experimenters 
that any student who really understood the point involved in an A 
question should be able to answer its carefully matched B question 
without difficulty. 

One hundred and sixty-eight freshmen and sophomore students 
served as subjects in Group II, which has been considered equivalent 
to Group I for the purposes of this study. Every effort was made 
to keep the class instruction and assignments identical to those which 
had been used with Group I. The same A questions which had been 
given the first group were given the Group II students as an original 
test (hereafter referred to as A3). For a surprise test the following 
day, Group II was given Test B. 
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RESULTS 


Evidence from Correlations.—Reliability coefficients, computed by 
the odd-even method, were: For the scores of Test Al, .762 + .016; 
for the scores of Test A3, .608 + .032; and for Test B, .810 + .017. 
All coefficients have been corrected by the Spearman-Brown prophecy 
formula.* 


TaBLE I.—CorRRELATION COEFFICIENTS BETWEEN ORIGINAL AND SuRPRISE TEST 











Scores 
Group I Group II 
(tests Al and A2) (tests A3 and B) 

REE ES Py eres trace. 284 168 
SSE ve a Pa cet ea .761 . 290 
Reis Na kis wes x Siren ad hare 4 .016 .047 
RE OF es ee re 471 
DISSE EEL TE TORT tLe PERE SE .0496 
Rs on ndadaackeseeennsada 9.50 








Since the reliabilities are reasonably large, especially in view of the 
fact that the tests themselves contained only fifty questions, the 
correlations in Table I were computed. They show the relationship 
between scores on the origina! and surprise tests. It will be noted that 
the coefficient between the Al and A2 scores is reliably higher than the 
coefficient between the A3 and B scores.?, This may be interpreted to 
mean that the introduction of differently worded questions consider- 
ably disturbed the performance of individual students. 

Decrease in Score on the Second Test.—In Table II are shown, for 
both groups, the percentages of scores on the surprise test which were 
lower, equal to, or above their corresponding scores on the original test. 
Eighty-nine per cent of the students of Group II scored lower on the 
surprise test than they did on the original test. Yet only forty-six 
per cent of those in Group I fell below their Al scores when Test A2 
was given. Herewith is additional evidence that changing the wording 
of a question (as in the case of Group II) seriously affects a student’s 
ability to answer it. 





1Cf. Garrett, H. E.: Statistics in Psychology and Education. New York; 
Longmans, Green, 1937, p. 315. 

2 Cf. Ibid., p. 281, for the method of computing the reliability of the difference 
between correlation coefficients. 
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This fact was further substantiated by a tabulation of the separate 
responses to the sixteen thousand eight hundred questions of Tests 
A3 and B, which were answered by the students of Group II. The 
results of this tabulation, when arranged so as to compare the matched 
questions on each test, showed that forty-seven of the total of fifty B 
questions were missed a greater number of times than their correspond- 
ing A questions. Of the remaining three pairs, one had an equal num- 
ber of errors in both tests. Only two of the entire fifty B questions 
had fewer errors than the A questions with which they were matched. 


4 
TaBLe I].—PrErcentaces or Stupents TakinGc Surprise Test, WHo Scorep 
Lower, EquaL TO, OR HIGHER THAN THEIR CORRESPONDING SCORES ON THE 
ORIGINAL TEST 


























Group I Group II 
Number of cases 
284 168 
Per cents 
ERE REE Sat FON POE GE ee DMI IGE et 46 89 
Bee carat treet : ; 
CE a eS ek oer eo ceeee ne 100 100 











Effect of Changing the Test Conditior:s—The average drop in the 
scoring ability of the subjects of Group I, when they answered ques- 
tions in the classroom which they had previously answered outside of 
class, is shown in Table III.. Since each test contained fifty questions, 
fifty, according to our method of scoring, was the maximum score 
which could be made. None of the group averages in this table has 
missed the maximum by as much as three full questions. The high 
averages point to the bunching of the scores at the upper ends of the 
distributions. 

The difference between the average scores for Test Al and Test A2 
is surprisingly small (.74). Yet this difference is reliable as indicated 
by the critical ratio of 7.25, which is more than twice the value of 3.00, 
usually demanded for statistical reliability. The slightly smaller 
average score on the surprise test may have been due to the emotional 
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4 


disturbance occasioned by the surprise element itself. Or it may be 
taken simply as a measure of the degree to which the subjects failed 
to retain the answers which they had previously made to the Test Al 
questions. 


Taste I1].—Comparison oF Scores ON SAME Test QUESTIONS PRESENTED IN 
DIFFERENT SITUATIONS 





























Presentation Al A2 A3 
Coa ncseevcccesscaseseses I I II 
SE a ea eT ee 284 284 168 
pS EERE OEE ees ee ET 48.14 47.40 48.49 
OSE RR ae ee ey ay 2.20 2.62 1.95 
ee ss peeve naseeenet .131 .155 .150 

RELIABILITIES OF DIFFERENCES 
Between Between 
Al and A2 Al and A3 

i Goes wae ks a ie CASO Re ae eee we .102 .199 
Ne Re ee ee eb cae nts .74 .35 
I iss .5 bee) Kh awe ee eeeeerakew on 7.25 1.76 











It will also be noted from Table III that the average scores obtained 
by Group I and Group II on their original tests (Al and A3) were so 
nearly alike that the difference between the averages is unreliable. 
The critical ratio of 1.76 shows that the difference can easily be 
accounted for by chance factors. The unreliability of the difference 
in this instance may be taken to mean that Groups I and II were 
truly equivalent and that the class instruction and assignments given 
each group were very nearly alike.' 

Effect of Changing the Test Questions.—In Table IV are presented 
the average scores of Group II on its original and surprise tests (A3 





1 In the case of tests Al and A2, the exceptionally small value for opi. (.102) 
is in part a function of the formula 


oDift. = [ (av.1)? + (Gay.2)* = 2roay.1Fav.2]”” 


which was employed. The third term in the formula was necessary because the 
scores between Al and A2 were correlated, as will be seen from Table I. Since 
no such relationship existed between the scores of Al and A3, which were obtained 
from different groups of subjects, the formula used was 


oDift. = [ (Gav.1)? + (cav.2)?]”. 
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and B), together with the average of the same group on a control or 
“normal” test (C) having none of the experimental irregularities 
associated with the other two tests. Test C has been picked at 
random from among twelve other weekly tests given Group II as a fair 
sample of the usual or customary true-false test performance of its 
members. The subject matter of C was, of course, different from 
that of A3 or B. Yet the questions were of about the same difficulty, 
and may be considered as roughly equivalent to those in the other 
tests. The average score of 42.24 is typical of the performance of the 
group. It illustrates what is ordinarily done by students in this 
situation when they have no opportunity to review special true-false 
statements in preparation for a test. 


TaBLE I1V.—CoMPARISON OF Scores MapbE BY THE SAME Group or SUBJECTS IX 
ANSWERING DIFFERENT TEST QUESTIONS 














Test A3 B C 
pe rere rrerri Tree 48.49 44.41 42.24 
ee eT ee hee aie 1.95 3.47 3.37 
eh ae a ek .150 . 268 . 260 
Coefficient of skewness............ —1.82 — .95 — .86 





Reliabilities of Differences 





Between Between Between 
A3 and B A3 and C Band C 





tO USA ae Sg aT ee SAR "266 300 373 
A RAI, Se RSE Se hs Spe ee © 4.08 6.25 2.17 
TPE EP COUPE PTET ESET 15.34 20.83 5.82 














The critical ratios indicate that the differences between all three 
of the averages are reliable. The difference of 6.25 between the aver- 
age scores of Tests A3 and C shows the increase which takes place 
when the student gets the answers by whatever means he may choose. 
Expressed as a per cent of the control score this represents an increase 
of 14.8 per cent. Expressed as a per cent of the maximum score of 
fifty, it represents an increase of 12.5 percent. This result, when con- 
sidered in conjunction with the data already given in Table III, 
suggests the amount by which true-false scores may be expected to 
rise if questions ‘‘leak out” in advance of a test. 
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The difference of 4.08 between A3 and B shows the extent to 
which students can obtain the answers to true-false questions in 
their possession (A3) without grasping the real significance of the 
points involved (B). 
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Fig. 1.—Frequency polygons of the scores of one hundred sixty-eight students on 
three true-false tests of fifty questions each. The questions of Test A3 were answered 
by the students at their leisure outside of the classroom. Test B covered the same 
subject-matter as A3, but its questions were differently worded. This test was taken 
in class, shortly after the A3 sheets had been collected by the instructor. Test C wasa 
control test representing the average or customary performance of the group. 





The difference of 2.17 between B and C is a measure of the degree 
to which studying true-false questions actually helps in the mastery 
of course content. If this value is expressed as a per cent of the aver- 
age score of the C test, we may say that the average score of the class 
was raised 5.1 per cent by the special study of true-false questions per- 
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mitted in this investigation. Expressed as a per cent of the maximum 
score of fifty, the increase is 4.3 per cent. 

Frequency polygons of the distributions of scores for the A3, B 
and C tests are plotted in Fig. 1. The marked negative skewness in 
the A3 curve is similar to that which occurred with Tests Al and 
A2 (Group I). The C distribution, on the other hand, approaches 


closer to normal symmetry, while B lies between A3 and N. Coeffi- 
cients of skewness are given in Table IV. 


SUMMARY AND CONCLUSIONS 


Four hundred fifty-two freshmen and sophomore students divided 
into two equivalent groups were given preliminary practice in 
answering true-false questions. The questions were based upon an 
introductory course in general psychology. ‘Two special true-false 
tests (A and B) were then carefully prepared over the same subject- 
matter. Each question in Test A was matched with a question in 
Test B which covered the same point but was differently worded. 
Group I took copies of Test A home with them and answered the 
questions by whatever means they chose. Upon returning to class the 
next day they were “surprised” with a second copy of the same test 
which they were then required to answer under strict examination 
conditions. ‘The members of Group II also answered the A questions 
outside of class, but were subsequently “surprised” in the classroom 
with Test B. 


(1) Reliability coefficients of the two tests extended from .608 + 
.032 to .810 + .017. 

(2) Product-moment correlation coefficients between original and 
surprise tests were for Group I, .761 + .016; for Group II, .290 + .047. 
The lower coefficient in the second instance shows the effect of the 
different wording or phraseology of the questions in Test B. It, 
therefore, gives some insight into the lack of generalization of the 
material on the part of the student. 

' (3) Forty-six per cent of Group I scored lower on their surprise 
test than they did on their original test, while eighty-nine per cent of 
the Group II subjects scored lower on their surprise tests. 

(4) In Group II, forty-seven of the fifty B questions were answered 
wrong a greater number of times than the corresponding questions of 
Test A. 

(5) Comparisons between the A and B scores of Group II and the 
scores of the same subjects on a ‘‘normal”’ or control test of different 
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subject-matter but equivalent difficulty brought out the following 
points: . 

(a) The average score on Test A was reliably greater than the 
average score on the control test by 14.8 per cent of the control score 
or by 12.5: per cent of the maximum score of fifty. This increase, 
coupled with the findings from Group I, illustrates the undeservedly 
high score which is obtainable when test questions are known in 
advance of a test itself. 

(b) The average score of Test B was reliably higher than the aver- 
age of the control test by an amount equal to 5.1 per cent of the control 
average or 4.3 per cent of the maximum score of fifty. This increase 
shows in a quantitative way the benefit to be expected from using 
true-false questions in preparation for a subsequent true-false test 
composed of different questions covering the same ground. 











A FURTHER STUDY OF THE PSYCHONEUROTIC 
RESPONSES OF DEAF AND HEARING CHILDREN! 


N. NORTON SPRINGER 
Adolescents’ Court of Brooklyn, New York 
AND 


SYDNEY ROSLOW 
Industrial Teacher Training Faculty New York State Education Department 


This article is a continuation of a previous study made by Springer? 
of a comparison of the psychoneurotic responses of deaf and hearing 
school children with the Brown Personality Inventory. In that study, 
four hundred deaf children were compared with hearing children, who 
were approximately four years younger than the deaf. This age 
difference was due to the fact that the deaf are retarded in their lan- 
guage development and do not learn to read until a later age than the 
hearing. ‘Since the Brown inventory is entirely verbal in its content, 
only older deaf children, who can read, were used as subjects. In 
order to avoid having a control group of academically retarded hearing 
children of the same age, the control group consisted of younger 
children, who were approximately in the same grades as the deaf. 

The study presented in this article is based on the psychoneurotic 
scores of a small group of deaf and hearing children, who were selected 
from the original groups. These children were carefully paired and 
equated in age, sex, intelligence, general social status, and nationality 
of their parents. 

The purpose of the study was to determine whether deaf and hear- 
ing children differ from each other in their manifestations of mal- 
adjusted behavior, as measured by their psychoneurotic responses on a 
personality inventory. 

The Brown Personality Inventory for Children* was used for the 
measurement of psychoneurotic responses. Intelligence scores were 





1The authors wish to acknowledge their indebtedness for assistance in this 
study to the Works Progress Administration Project on the Deaf, No. 165-97- 
8000, sponsored by Professor L. W. Max, of New York University. 

2 Springer, N. N.: “‘A Comparative Study of The Psychoneurotic Responses of 
Deaf and Hearing Children.” Journal of Educational Psychology, September, 1938, 
pp. 459-466. 

? Brown, F.: “‘A Psychoneurotic Inventory for Children Between Nine and 
Fourteen Years of Age.” Journal of Applied Psychology, Vol. xvii1, 1934, pp. 
566-577. 
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obtained on the Goodenough Drawing of A Man Intelligence Test! and 
the general social status of the children was determined by rating their 
parents’ occupations on the Barr Scale of Occupational Status.” 


Taste I.—Prrsonat Data AND RESULTS OF THE Dear AND HEARING CHILDREN 
ON THE BROWN PERSONALITY INVENTORY 











Deaf Hearing Criti- Num- 
Mais. |SDai.| cal | r | ber 
Mean| SD | Mean| SD ratio pairs 





Personal data: 
Age in months... .|159.51| 9.56)159.41 9.49) .10| 1.78 .06 |.998) 59 
Goodenough IQ.| 89.18/14.27) 89.8813.33 .70| .76 .92 |.925) 51 
Barr score....... 9.91) 2.78) 10.24) 2.52 .33, .55 .60 |.644) 46 
































Brown results. ..... 32 .92)18.73 16.48)14.49 16.44) 2.94 | 5.59 |.073) 59 





The experimental group consisted of fifty-nine deaf boys ‘and girls 
between the ages of twelve and fourteen years, attending public day 
and residential schools of New York City. These children were paired 
with a comparable control group of fifty-nine hearing boys and girls, 
from the regular New York City public schools. The personal data 
of these groups are recorded in Table I. The groups were paired by 
sex, within one month of chronological age. More than eighty per cent 
of the children were matched within four points on the Goodenough 
IQ. The remainder were matched within ten points. The deaf and 
hearing children came from a middle-class social group, and were 
matched within one point on the Barr scale. Both groups consisted 
of white children who were born in New York City. All were of 
foreign-born parentage and they were paired according to the national- 
ity of their parents. An inspection of Table I shows that the mean 
differences between the deaf and the hearing on the above variables 
are very small and insignificant. There is only .10 month difference 
in chronological age, less than one point difference in IQ, and .33 on 
the Barr scale, in the means between the two groups. The paired 
deaf and hearing children can be considered comparably equated. 





Goodenough, F.: Measurement of Intelligence by Drawings. Yonkers-on- 
Hudson, New York: World Book Co., 1926. 

* Barr, F. E.: “Barr Social Rating Scale in Occupational Status,” in Terman, 
L. M.: Genetic Studies of Genius. Stanford University Press, Vol. 1, 2nd Ed., 
1925, pp. 542-545. 
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1. RESULTS ON THE BROWN PERSONALITY INVENTORY 


The mean neurotic score for the deaf (Table I), 32.92, is much 
larger and more unfavorable than the mean of 16.48 received by the 
hearing children. The mean difference of 16.44 has a high degree of 
statistical reliability, as is indicated by the critical ratio of 5.59. 

There is a wide range of neurotic scores. Both the deaf and hearing 
children receive neurotic scores between four and seventy-five points. 
The standard deviations of the distributions are very large, and that 
of the hearing children, 14.49, is almost as large as the mean score. 
Since the Brown scale measures atypical symptoms, the distributions 
are skewed. There is some overlapping of the individual scores of the 
deaf and hearing. About forty-six per cent of the deaf receive scores 
that fall at, or below, the hearing mean, while only six per cent of the 
hearing children receive scores that are at, or above, the mean of the 
deaf. 

When the results of this study are compared with Brown’s norms,' 
the deaf children fall within the ninth decile and the hearing children 
correspond to the fifth decile. According to Brown’s classification, 
the deaf make a ‘“‘very poor adjustment.’”’ The hearing children are 
within the ‘‘average adjustment”’ category. 

The findings in regard to the high neurotic score of the deaf children 
of this study are very similar to those reported by Springer? for an 
older group of deaf. The younger children receive a mean score that 
is less than four points lower than that received by the older deaf. 
The control group of this study, however, is more representative of a 
well-adjusted group of hearing children. 


2. SIGNIFICANT ITEMS ON THE BROWN PERSONALITY INVENTORY 


The percentage of symptomatic responses on each item in the 
inventory was calculated for the deaf and hearing children. The 
significance of the percentage differences for the deaf and hearing was 
then calculated in terms of the standard error of the difference. The 
results indicate that there is a tendency for the deaf children to give 
more symptomatic responses than the hearing children on practically 
all the questions of the inventory. In forty-four out of the eighty 
questions, the percentage of symptomatic responses of the deaf children 





1 Brown, F.: Suggestion with Regard to Use and Interpretations of the Brown 
Personality Inventory. New York: Psychological Corp., 1935, p. 2. 
2 Springer: Op. cit. 
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is larger and statistically significant, 7.e., the critical ratios are above 
3.00. On only one question, ‘‘Have you been told at home that 
children should be seen and not heard?” is the percentage of symptom- 
atic response of the hearing children larger and statistically significant. 

It is difficult to group in well-defined categories the wide range of 
situations covered by these significant questions. A rough classifi- 
cation indicates that the deaf children make more symptomatic 
responses with regard to physical symptoms, general social adjust- 
ment, home situation, and school adjustment. None of these point to 
a classification of problems peculiar to the deaf. Brown! reports that 
children who make high neurotic scores show reliable differences when 
compared with normal children with regard to these same situations. 
While the results on the Brown inventory indicate that the deaf chil- 
dren are more neurotic than the hearing group, it does not seem possible 
to make any classification of situations which would tend to show the 


deaf as a unique group, so far as personality problems indicated by 
the items on the Brown inventory are concerned. 


3. THE VALIDITY OF THE USE OF THE BROWN INVENTORY 
ON DEAF CHILDREN 


With the use of the Brown inventory there are certain situations 
where the sensory defect in itself might have conditioned the response, 
quite apart from its psychological significance. This was suggested 
by such items as, ‘‘ Do you ever have the feeling that you are not like 
other children?” ‘‘Have you ever been told at home that children 
should be seen and not heard?” If there were a considerable number 
of questions, whose responses would be conditioned for the deaf 
children by the sensory defect, regardless of the psychological signifi- 
cance, the whole inventory might be invalidated for use with the deaf, 
and the results thus far obtained on the Brown Personality Inventory 
might be valueless. In order to check this possibility, the method 
employed by Welles? in regard to the validity of the use of the Bern- 


reuter Inventory with hard-of-hearing adults was applied to the results 
of this study. 





1 Brown, F.: “‘A Psychoneurotic Inventory for Children Between Nine and 
Fourteen Years of Age.” Journal of Applied Psychology, Vol. xvi, 1934, p. 576. 

? Welles, H. H.: The Measurement of Certain Aspects of Personality Among Hard 
of Hearing Adults. Teachers College, Columbia University Contributions 
Education, 1932, No. 545, p. 57. 
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Two different groups of judges were requested to ‘‘ Mark the items 
on the Brown which, in your opinion, could be answered by deaf 
children in just one way, because of the sensory defect itself and regard- 
less of the psychological significance.”” One group of judges consisted 
of three psychologists, who were experienced in the use of adjustment 
questionnaires. Two of these psychologists had intimate contact 
with the deaf for more than two years. The second group of judges 


- consisted of three superior deaf adults, all of them college graduates, 


who had experience in dealing with deaf children. The results are 
given in Table II. 

The psychologists were in unanimous agreement on two items and 
in majority agreement on five items. Of these seven items, only 
three of them were found in this study to be significant. The deaf 
judges were in unanimous agreement on two different items, both of 
which were significant. A majority of the deaf judges agreed on three 
items, two of which were significant. When the items selected by the 
two groups of judges were compared, the psychologists were unanimous 
or in majority agreement with the five items selected by the deaf 
judges. These common items were: 


1. Do you feel that people do not understand you? 

2. Do you ever have the feeling that you are not like other children? 

3. Have you ever been unable to hear or see for a while? 

4. Have you ever been told at home that children should be seen and not 


heard? 
5. Do your parents ever speak to you in a loud tone of voice? 


TasLe II.—Brown Personatity INVENTORY ITEMS AGREED UPON BY Two 
Groups OF JUDGES AS REQUIRING RESPONSES FROM THE Dear DETERMINED 
BY THE SeENsORY DersctT In ITSELF APART FROM THEIR 
PsYCHOLOGICAL SIGNIFICANCE 








Psychologists Deaf adults 
oe Number at Number 
significant , significant 
marked marked 





Unanimous agreement......... 2 1 out of 45 2 2 out of 45 
Majority agreement........... 5 2 out of 45 3 2 out of 45 
Total number of items marked. 9 5 out of 45 8 4 out of 45 

















Only three of these items, Nos. 2, 3 and 4, were found in this study 
to be significant. 
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The number of items selected by both groups of judges which, in 
their opinion, could be answered by the deaf children in just one way 
because of the sensory defect itself, is small when compared with the 
eighty items in the inventory. Nevertheless, it was felt to be highly 
desirable to determine the influence of these questions on the children’s 
scores. It must not be overlooked that when any items are eliminated 
from the inventory, the measuring power of the inventory is lessened. 

The inventories for the deaf and hearing children were rescored and 
the means and sigmas were computed when the seven questions 
selected by the majority of the psychologists were eliminated. The 
papers were also rescored with the five questions selected by the deaf 
judges eliminated. The results of the elimination of these items 
(Table III) are surprising. While the means of both the deaf and the 
hearing children are reduced by the dropping of the items, the mean 
difference between the groups become larger and the critical ratios are 


TaBLE IIJ.—Errects or Droprine Out CERTAIN INVENTORY ITEMS UPON THE 
Megan DIFFERENCE BETWEEN THE DEAF AND HEARING CHILDREN 





Mean | Critical 


Number of items difference} ratio 








ee isi een cds ceéanecesecens 16.44 5.59 
Less seven items marked by the majority of the psychologists| 17.63 7.11 
Less five items marked by the majority of deaf adults'...... 18.39 7.57 








1The majority of the psychologists, as well as the deaf judges, agreed on these 
five items. 


increased. The hearing children’s scores are lowered from four to 
five points, while the deaf children’s scores are decreased by two or 
three points. From these results, it is safe to conclude that as far as 
the judgment of the psychologists and the\deaf adults are concerned, 
the validity of the use of the Brown Personality Inventory with 
deaf children, is not affected by any of the questions, and it can be used 
with deaf children. 


SUMMARY AND CONCLUSIONS 


The Brown Personality Inventory was given to fifty-nine pairs of 
deaf and hearing children in order to determine whether there are 
any group differences between deaf and hearing children in their 
manifestations of maladjusted behavior, as measured by their psy- 
choneurotic responses. The groups consisted of children between 
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the ages of twelve and fourteen years. They were closely matched 
and equated on the basis of age, sex, intelligence, general social status, 
and nationality, and were found to be comparable on these variables. 

The neurotic scores of the deaf children were much higher and more 
unfavorable than those received by the hearing children. The mean 
difference between the deaf and hearing was of a high degree of 
statistical reliability. The deaf children fell within the “very poor 
adjustment” category, while the hearing children made an “average 
adjustment.”’ These results are in close accord with those reported 
by Springer in regard to the high neurotic scores of older deaf children. 

The deaf children gave more neurotic responses, which were 
reliably higher than those of the hearing children, on forty-four items 
of the inventory. The situations on which they differed can be class- 
ified under the general headings of physical symptoms, general social 
adjustment, home situation, and school adjustment. These situations 
are the same as those reported by Brown for a group of neurotic 
hearing children. None of the individual items on which the deaf 
children differed from the hearing children tend to show the deaf as a 
unique group. 

The extent to which the sensory defect itself, regardless of the 
psychological significance, affected the deaf children’s response to the 
individual items on the Brown Personality Inventory was studied by 
means of the opinion of a group of psychologists and a group of 
superior deaf adults. The majority of both groups agreed that the 
response to five items would be conditioned by the deaf children’s 
sensory defect. When these items were eliminated, the difference 
between the means of the deaf and hearing children was increased 
slightly. These results indicated that the sensory defect did not 
invalidate the use of this inventory with the deaf children. 
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COMPREHENSION MATURITY TESTS—A NEW 
TECHNIQUE IN MENTAL MEASUREMENT* 


D. D. FEDER 


University of Illinois 


The perception of and reaction to relationships is the chief function 
of the intellect in the solution of any problem situation. It is in the 
former activity that individual differences are primarily and basically 
operative. Some individuals perceive only small, limited areas of 
any problem or stimulating situation. Others perceive much broader, 
more complex aspects of the same situation. Such differences are 
determined by both the hereditary equipment of an individual and 
the way the person has been changed by his training and past experi- 
ence. The illustration of putting a person of normal intelligence and 
a moron in the same physical stimulating situation, and then observing 
the reactions of each has been used frequently in discussions of indi- 
vidual differences. In such circumstances it has been shown that the 
former would have much more complex reactions. The difference 
has been ascribed in the past solely to the different intelligence of the 
two and resultant differences in ability to respond. This is only part 
of the explanation. The difference in ability to structurize or organize 
the physical stimuli into a psychological pattern with meaning is even 
more fundamental, as is also the ability to perceive already structured 
patterns of varying degrees of complexity. The objective stimuli may 
be the same for both, but due to differences in perception, the psycho- 
logical fields in which the tensions are operating are entirely different 
for the two individuals. 

At this point it may be well to insert a warning against confusing 
mere sensory acuity or discrimination with what, for want of a better 
term, we may call the structural pattern of perception. The tendency 
to regard the former as if each were in itself the entire act of perception 
is a relatively common error. Some investigators have confused 
sensory acuity, span of perception, etc., with the problem of the inter- 
relationships between wholes and parts in a given situation. As a 
consequence, they have secured various correlations between sensory 
acuity and intelligence and measures of attainment, none of which lead 
to any psychologically acceptable conclusions. 





*The author wishes to recognize the contribution of Mr. Dan L. Adler in the 
construction and analysis of certain of these tests. 
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The mental test conception of perception places it far beyond mere 
sensory perception and calls upon the highest levels of the nervous 
system, tying them into a dynamic interrelation. It is an active 
process in more than a simple time-space notion, and implies a more 
genuinely dynamic interpretation. Individual differences are thus 
seen to be due primarily to the fact that the perceived situation to 
which the reaction is made differs from individual to individual. 
Furthermore, for a given individual the same stimulus may be differ- 
ently perceived upon successive occasions depending upon attendant 
conditions. ‘ 

Based upon the premise that the highest intellectual processes are 
expressed in the perception of and reaction to the dynamic interrela- 
tionships of wholes and parts, a series of tests have been constructed 
employing the medium of reading comprehension. The tests have 
been called Comprehension Maturity Tests because they furnish 
qualitatively varying levels of response which are indicative of the 
maturity of reading comprehension. In contrast with the usual 
objective tests, the maturity tests cover a narrower field, but attempt 
to measure the student’s depth and breadth of understanding and 
integration of given material. However, this does not of necessity 
mean that a maturity-type test must always represent a narrow sam- 
pling of the field. 

Whereas the usual type of test assumes comprehension or knowl- 
edge to be of an all-or-none nature, the maturity test assumes that 
there are gradations of comprehension of even a very simple idea. 
These gradations may range from a superficial acquaintance with some 
outstanding detail to a deep integration and comprehension from which 
the individual may extract fundamental principles. This type of test 
demands that the student evaluate the material before him, and dis- 
criminate among several responses of varying quality instead of calling 
forth a rote memory response. The psychological approach to the 
concept of maturity in reading is based upon demonstrable individual 
differences in the ability to perceive the interrelationships of ideas. 
This ability conditions all reading and is a basic manifestation of 
intelligence. 

In keeping with the hypotheses set forth for experimental purposes, 
the first Reading Comprehension Maturity Test (abbreviated to 
RCM E-1) was constructed as follows: 

Part I.—An excerpted passage from H. H. Newman’s lecture on 
evolution to University of Chicago freshmen. The items purported 
to test comprehension of factual material. 
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Part II.—An excerpted passage from Rousseau’s Confessiuns, 
dealing with his religious beliefs. The items were constructed in an 
effort to test appreciation. 

Part III.—An excerpted passage from Schopenhauer’s essay On 
Education. ‘The items purported to test ability to make inferences. 

Subjects were instructed to read through the entire passage before 
attempting the test items. Each of the test items was based directly 
upon a specific paragraph in the passage. Numbers of the items corre- 
sponded directly to the numbered paragraphs of the reading material. 
The test items were arranged in groups of four responses. In each 
group of the factual items one statement was false, one indicated the 
grasp of a single outstanding detail, one indicated a more complete 
comprehension of the paragraph, and one a complete summary of the 
paragraph made into a general statement. For the appreciation type 
of reading a similar gradation was effected. The inference items began 
with a simple inference and proceeded to more complex ones, the 
attempt always being made to lead the student to think beyond the 
material immediately before him and envisage it in its wider signif- 
icance. In each case the subjects were instructed to write a ‘‘B”’ 
before the best statement and a ‘‘W”’ before the worst or false state- 
ment. The gradation of items was checked by a group of technically- 
trained persons. 

The test was administered to over seven hundred freshmen in the 
College of Liberal Arts. Reliability computed by the chance-halves 
method and stepped up by the Spearman-Brown prophecy formula, 
using a random sample of one hundred, was .88 when both “best” and 
“worst”? answers were used. This coefficient dropped slightly when 
only ‘“‘best’”’ answers were used. 

In order to determine the discriminating power of each response 
within each item, the average score on the entire test of all students 
who selected a given response was computed. An item was considered 
satisfactory when the progression of averages followed this general 
pattern: average score of those who called the best response “‘best,”’ 
136.5; average score of those who called the second best response 
“‘best,”’ 128.7; average score of those who called the poorest true 
response ‘‘best,” 95.6. Differences in this order, such as the second 
best response receiving a higher average than that of the best response, 
were found to indicate ambiguity of a response, some special difficulty, 
or lack of adequate differentiation between responses within a given 
item. Similar relationships hold for the false or “‘worst’’ response. 
This type of analysis indicates the efficiency of each individual item 
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in discriminating the good from the poor students in terms of their 
total performance on the test. It further permits the location of 
faulty or non-discriminating items, so that steps may be taken toward 
improving them. Item analyses in terms of achievement criteria 
have yielded almost identical results. 

The interpart correlations yielded by this test are so small 
(yn = .31; ram = .37; Tn-m = .43) as to suggest that there is a 
real distinction between the areas being measured although apprecia- 
tion and inference show somewhat greater relationship than appears in 
the other two correlations. ° 

There is the possibility, however, that these were spurious coeffi- 
cients of correlation due to the different reading materials rather than 
to a true distinction in types of test items and attitudes of those taking 
the test. To check this possibility the original test was revised in 
terms of the aforementioned item analyses and a new test (hereafter 
designated as RCM E-2X) was constructed on exactly the same prin- 
ciples, but with the following characteristics: 

1. Instead of three excerpted reading passages, the new form used 
only one reading passage! fifteen paragraphs in length. 

2. The items of the RCM E-2X were of two types only—those 
purporting to test comprehension of factual material, and those con- 
structed to test ability to make inferences. No items intended to test 
appreciation were constructed. Each of the two parts was composed 
of fifteen sets of responses numbered to correspond to the paragraphs 
in the reading material. 

3. The directions preceding each part of the test were reworded 
for greater simplicity and clarity. 

A preliminary form of the RCM E-2X was administered to twenty- 
five graduate students and instructors in psychology at the State 
University of Iowa, who ranked all four responses in each item. The 
final form of the test was determined after corrections were made for 
ambiguity and non-discrimination of items as indicated by progression 
scores of the sample group. In similar fashion the RCM E-1 test was 
subjected to revision in the light of the previously mentioned analyses. 

A third test (The Organization Test) was constructed to determine 
the silent reading rate when comprehension is extremely simple. 
The technique is an adaptation of Courtis’s procedure known as 





1From Cantril, Hadley, and Allport, Gordon W.: The Psychology of Radio. 
New York: Harper and Bros., 1935. 
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“differential testing.’”’' It was believed that this procedure could 
be used to determine the influence of reading rate in the RCM tests. 

The Organization test consisted of seventy simple scrambled 
sentences, each made up of six words, no word being less than four 
or more than six letters in length. The sentences were scrambled by 
words and phrases, each sentence containing four numbered divisions. 
Subjects were instructed to rearrange the sentences correctly, indi- 
cating in spaces provided for the purpose the numerical order after 
correction. They were also instructed to put a circle around the word 
they were reading each time the examiner called ‘‘mark.”” The 
“mark” calls were made at the end of the first, second, third, fifth, 
and tenth minutes. 

Scores on the Iowa Silent Reading Test, which had been adminis- 
tered prior to the experiment, were obtained in order to permit com- 
parisons between the maturity type tests and a more traditional test 
of reading comprehension. 

The subjects for this experiment were members of the class in 
elementary psychology at the University of Iowa. The class was 
composed approximately as follows: fifty-five per cent sophomores, 
thirty-five per cent juniors, five per cent seniors, five per cent graduate 
students and unclassified. However, a more homogeneous sample of 
ninety-nine sophomores for whom complete data were available was 
used in computing the test statistics. 

Preceding the test period the writer attempted to establish the 
most favorable rapport possible. To rule out fatigue factors the tests 
were administered in two different periods. The RCM E-1 (Rev.) 
(total time, one hour) was given during one test period; the RCM 
E-2X (total time, forty-three minutes) and the Organization test 
(total time, ten minutes) were given during the second test period 
one week later. The time limits for each test permitted almost all 
of the subjects to finish. 

The somewhat lower interpart correlations (riz = .30 + .06; 
Tis = .24 + .06; ro3 = .29 + .06; N = 99) for the revised form of the 
RCM E-1 corroborate the earlier findings. The slight decrease in 
interpart correlations probably may be attributed to a better demarca- 
tion of the original reading ‘‘types”’ resulting from the revision. 





' Courtis, S. A.: “Maturation Units for the Measurement of Growth.” Sch. 
and Soc., Vol. xxx, Nov. 16, 1929, pp. 683-690; and ‘‘The Prediction of Growth.” 
J. Educ. Res., Vol. xxv1, March, 1933, pp. 481-492. 
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Experimentation with a variety of weightings for the RCM E-2xX 
test showed the following to yield the highest reliability: 
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A correlation of .41 + .06 was obtained between the two parts of 
the RCM E-2X test; this coefficient is especially significant in view of 
the fact that the reading passage was common to both test parts. 
Although it is of greater magnitude than coefficients obtained with 
different reading materials, it is still low enough to substantiate the 
posited difference between reading for comprehension of facts and 
reading for inferential purposes. ‘Table I summarizes the intercorrela- 
tion between the two Comprehension Maturity Tests. 


TaBLE I.—INTERCORRELATIONS OF RCM E-1 (Rev.) with RCM E-2X 

















RCM E-2X, 
RCM E-1 I, r PE, Il, r PE, Total, r PE, 
I 44 + .05 .25 + .06 .40 + .06 
II .36 + .06 .21 + .06 .382 + .06 N = 99 
III .48 + .05 .45 + .05 .54 + .05 
A ae ee .60 + .04 .44 + .05 .60 + .04 








That the correlation of the RCM E-1 (Rev.) total with the RCM 
E-2X total is not higher may be partially accounted for by the absence 
in the latter of items testing appreciation. That this reasoning is 
justified is borne out by the low correlations between the appreciation 
part (Part II) or RCM E-1 (Rev.) and the parts of RCM E-2X. 
Further, the analogous parts of the two tests have a closer relation- 
ship with each other than has the appreciation part with either. It 
is suggested, then, that reading for appreciation may be a type differ- 
ent from reading for factual information or inference. 

The coefficient of correlation of .25 between the factual part of the 
E-1 form, and the inference part of the E-2X form suggests the dis- 
tinction between the two. 
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The Organization test, composed of extremely simple scrambled 
sentences, offered little opportunity for making errors. The few 
made were due, for the most part, to the presence of items for which 
more than onecorrect rearrangement wasconceivable. Scores obtained 
in the five minute interval were found to have greatest reliability 
and validity and so have been used in the subsequent phases of the 
study. 

The Organization test, designed for purposes of partialling out the 


influence of speed of reading in the RCM tests, yields the following 
coefficients of correlation: 


TasBLe II.—CokErFFIcIENTs OF CORRELATION BETWEEN THE ORGANIZATION TEST 
AND THE COMPREHENSION Maturity AND Iowa Srtent REaApING TESsTs 











RCM E-1 (Rev.) RCM E-2xX 
Ns ccd « ie Wieck Kiara .27 + .06 | eee Fo .25 + .06 
Ee re .22 + .06 RSs i .10 + .07 
Re RED TE e. .32 + .06 CE A Meret eee ka oad .20 + .06 
MEN LGiwast se vs anaes .38 + .06 | Iowa Silent Reading... ... .59 + .04 








From the correlations, it may be inferred that speed of reading as 
measured by the Organization test plays a relatively small part in 
determining comprehension maturity scores, but is a quite important 
determinant of scores on the Iowa Silent Reading test. 

To check the validity of the distinctions posited in the construction 
of the RCM tests and the Organization test, a factorial analysis 


employing the Thurstone technique was made. The following six 
variables were studied: 


1. RCM £-1 Part I —Factual material 


2. Part II —Appreciation (identity) 
3. Part [1I—Inference 

4. RCM E-2X Part I —Factual material 

5 Part II —Inference 


6. Organization test | —Speed of reading involving simple comprehension 


Factor loadings were determined by the center of cravity method. 
The results, after rotation of the axes, are summarized as follows:' 

1. Variables 1 and 2 have a common basis, but they are either 
unreliable or endowed with a high degree of specificity. They are 
different from the other variables. 





1 The detailed summary of this work, including factor loadings, may be found 
in: Adler, Daniel L.: A Study of Intelligence as Manifested in Perception of Relation- 
ships in Reading. Unpublished Master’s thesis, University of Iowa. 
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2. Variable 4 is closely related to variables 1 and 2, but probably 
has somewhat more of 5 within it than have the latter. 

3. Variables 3 and 5 are characterized by high communality and 
less specificity. They are different from the other variables. 

4. If 3 and 5 test inference (as they purport to), 5 is the better 
test. 

5. Variable 3 has more of 6 in it than has 5. 

6. Six is definitely different from all other variables, but probably 
has some relationship to them; it is more closely related to some than 
to others. : 

7. It may be concluded that the data definitely tend to support the 
original assumptions. 

One of the most significant findings for the comprehension maturity 
tests has been the consistency with which any given individual tends 
to select responses of a certain level throughout a test. The superior 
individual tends consistently to select responses of the best level, 
whereas the inferior student tends to select most of his responses on 
the poorest level, not infrequently confusing false statements with 
true ones. The majority tend to select responses on the second-best 
level, occasionally dipping down into the poorest or false and occa- 
sionally managing to get the best response. This type of behavior 
has been noted in connection with every test of the maturity type 
which has been constructed. Illustrative of this phenomenon are the 
data in Table III. 

As the number of best responses decreases, the average number of 
next best responses increases. This movement seems to reach a 
plateau at about the level of fourteen best responses, and, from this 
point on, the average number of poorest true responses begins to take 
on significance. This movement is succeeded by a third at the level 
of about eight best responses. At this point the false responses begin 
to figure more heavily. These findings confirm abasic assumption mir- 
rored in the construction of these tests; namely, that the level of 
response of any individual is determined by his level of relation-per- 
ceiving ability. 


SUMMARY AND CONCLUSIONS 


The assumption that individual differences are due to the fact that 
the perceived situation to which a reaction is made differs from indi- 
vidual to individual has been made the basis of a new technique in 
mental measurement. This technique involves differentially graded 
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responses, and has been applied to college adult subjects, using reading 
as the testing medium. 


TaBLE III].—AVERAGE OF NUMBER OF RESPONSES ON Eacu LEVEL WHEN GROUPED 
AccorDING TO Brest RESPONSES 
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‘ 
The experimental findings have revealed two apparently distinct i 
types or attitudes of reading with a third type indicated but not e 

f clearly distinguished from the other two as yet. Reading for informa- big 

4 tion and reading for inference were posited at the outset and have been } 

is E- shown by factorial analysis to be relatively independent. Reading i 

e for appreciation bears a close relationship to reading for inference, \ 

a] but has also certain factor loadings not yet completely identified. The 

n | influence of rate of reading upon comprehension scores may be effec- 

r- tively controlled by the adaptation of the differential testing technique 

of which was employed. ‘ 

r- Analysis of the responses reveals a marked tendency for most indi- 
viduals to react quite consistently upon that level of response which 
characterizes the quality of their insight. Bearing in mind the dangers f 

: attendant upon the use of the term ‘‘type”’ in psychology, it may be a (i 
at said that there are characteristic types of readers as well as types of 

i- reading. 

in In application the comprehension maturity tests have furnished the 

»d Reading Clinic with a measure which seems more nearly to approach 
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pure comprehension than do other tests purporting to measure compre- 
hension. Used extensively by the Department of French as ‘‘depth” 
measures of comprehension, the technique has been thoroughly 
studied and has yielded satisfactory results. The technique derives 
value for achievement tests in that it permits the discriminative and 
evaluative functions of intelligence to operate in the test situation as 
well as in the learning situation. 

Because perceptual ability is fundamental to all other mental 
functions, it may be expected to condition them. As growth occurs, 
with normal mental and physiological development, individuals can 
be taught to perceive not only the simple relationships of their immedi- 
ate environment, but also the complex relationships of a science. 
Proficiency may be due not merely to the possession of sufficient 
mental power to master the principles involved, but to the basic 
ability to perceive the relationships involved. Whether this ability 
is an innate and unmodifiable element, or one responsive to training 
within physiological limits, is only one of the problems suggested for 
further experimentation. Study of related changes in higher mental 
processes due to changes in perceptual level may also contribute to our 
understanding of the nature and function of the mind. 
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A SUGGESTED METHOD FOR MEASURING THE 
EFFECTIVENESS ee INTRODUCTORY 


HENRY BEAUMONT 
University of Kentucky 


The purpose of this study was to devise a means of determining the 
relative effectiveness attained by different instructors in the same 
department in teaching introductory courses, on the basis of the sub- 
sequent achievement of their students who took advanced work in 
that department. Six of the instructors who taught sections of ele- 
mentary psychology at the University of Kentucky during the years 
1930-1931, 1931-1932, and 1932-1933 served as subjects. During 
the six semesters and three summer sessions covered by this survey, 
two thousand four hundred ninety-three individuals had been enrolled 
in their introductory sections. This group had produced seven 
hundred twenty-eight advanced students in psychology, who took 
twelve hundred fifteen advanced courses in the department. Com- 
plete records were obtained on each student’s performance in intro- 
ductory and advanced work, and each student was credited to the 
instructor who had been in charge of his elementary section. Students 
who had been enrolled in the two-semester introductory course and 
had changed instructors after the first semester, were credited to both 
instructors; similarly, instructors were given double credit for students 
who had been enrolled in their section during both semesters. This 
made possible the following classification of our material. 

This table shows considerable differences among the instructors. 
Three of them—L, M, and P—had prepared a greater proportion of 
advanced students than would have been expected on the basis of the 


TaBLeE I.—DiIsTRIBUTION OF THE MATERIAL 








iit aniilal Per cent of Per cent of Number of 
Introductory | . ‘advanced advanced advanced 
instructor introductory students enrollment courses per 
carelment prepared . supplied student 

K 14.44 8.52 10.94 2.14 

L 21.74 23.49 17.20 1.22 

M 17.21 20.74 17.94 1.44 

N 20.46 18.82 13.91 1.23 

O 12.51 12.50 22.30 2.99 

P 13.64 15.93 17.69 1.85 
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proportion of elementary students registered in their sections. But in 
the other three cases, the share of advanced students was smaller than 
might have been expected. Similarly, instructors M, O, and P 
supplied a larger percentage of the total advanced enrollment than 
was to be expected on the same basis, while the other three supplied 
smaller proportions. The discrepancies between these two measures 
shows up clearly in the last column, which lists the average number 
of advanced courses taken by those students in each introductory 
section who took such work in the department. 

A more significant means of differentiation is qualitative and Based 
on the grades in introductory sections obtained by students who took 
advanced work in the department. Table II gives this information in 
terms of the percentage of students making each passing grade in the 
different sections which later registered for advanced courses in 


psychology. 


TaBLE II.—PERCENTAGE oF INTRODUCTORY STUDENTS TAKING ADVANCED Work 





Grades in elementary course 





Introductory instructor 




















A B C D A-D 

K 33.33 | 31.31 | 14.00 | 4.26 | 17.22 

L 55.17 | 35.43 | 37.17 | 23.23 | 31.55 

M 52.00 | 38.75 | 36.41 | 31.67 | 35.20 

N 38.78 | 29.85 | 30.57 | 24.67 | 26.86 

O 57.69 | 18.37 | 31.79 | 35.85 | 29.17 

FP 59.46 | 34.85 | 32.35 | 47.37 | 34.12 

Average for department............ 49.41 | 32.43 | 30.38 | 27.84 | 28.35 





If it is true that one of the functions of an introductory course is to 
motivate and prepare students of superior promise for advanced work 
in the department, a larger proportion of A and B students than of 
“just passing” students might be expected to continue its studies in 
the field. That this was true of the department as a whole, but not of 
all individual instructors, follows from the above table. These men 
differed widely, not only in the percentage of their students making 
each of the different grades which continued its work in psychology, 
but also in the percentage of their introductory students as a group 
which took advanced courses. In the latter respect, instructor M 
did twice as well as his colleague K. 

In order to determine the effect of intelligence on this situation, 
all students registered in the College of Arts and Sciences and who had 
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remained in school until their graduation, were selected from each 
section. In this manner, a small but homogeneous group was obtained 
which contained no members who had dropped out of school and none 
who had been required to take advanced work in psychology. The 
average intelligence deciles of these six groups and the proportions of 
their students taking advanced work are shown in Table III. 


Taste Il].—Tue Averace INTELLIGENCE oF StrupEeNnts In INTRODUCTORY 
SECTIONS AND THE PERCENTAGE OF StupENTs TAKING ADVANCED WorkK 








Introductory instructor | __Average decile prac to y 
K 4.40 33 
L 3.90 52 
M 3.96 63 
N 4.85 50 
O 4.77 50 
P 3.60 57 











The rank order coefficient of correlation between the average 
intelligence decile of the Arts and Sciences students in each section and 
the proportion of them which took advanced work in the department 
was found to be R = .64 + .11. Though the number of cases was too 
small to permit valid conclusions, the size of this correlation suggests 
that there were other factors besides intelligence which accounted for 
the fact that these students registered for advanced courses in unequal 
proportions. 

Another explanation of these differences might be the percentage of 
students in the different sections for which advanced psychology 
courses constituted a requirement. This percentage ranged from 
37.00 to 53.86 with a mean of 43.96, so that roughly one-half of the 
group was required to take One or more additional courses in the 
department. That differences in proportion of students required to 
take advanced work did not influence the proportion of introductory 
students actually taking such courses, may be concluded from Table 
IV, which lists the percentage of all students and of Arts and Sciences 
students enrolled in each beginning section who continued to take 
courses in the department. For the latter group of students, advanced 
courses were elective rather than required. 

It is particularly important to note that the rank order of the 
instructors is the same in either case. The Pearson coefficient of 
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correlation between the percentage of all students and that of Arts 
and Sciences students taking advanced courses was found to be 
r = .97 + .16. Obviously, then, the proportion of students in an 
introductory section which was required to take advanced courses did 
not account for the differences in proportion of students in the various 
sections actually taking advanced work in psychology. 


TaBLeE IV.—PERCENTAGE OF INTRODUCTORY STUDENTS TAKING ADVANCED Work 











Arts and 
oe All students Rank order sciences Rank order 

students 

K 17 6 33 6 

L 32 3 52 3 

M 35 1 63 1 

N 27 5 50 4.5 

O 29 4 50 4.5 

P 34 2 57 2 

















In addition to the quantitative measures obtained by the above 
method, it was of interest to find the quality of advanced work done by 
students prepared in each of the introductory sections. The following 
table indicates the percentage of students making better-than-average 
and below-average grades in advanced courses in psychology. 


TaBLE V.—PERCENTAGE OF StTupDENTS MAKING BETTER-THAN-AVERAGE AND 
BELOW-AVERAGE GRADES IN ADVANCED CouURSES 











Grades in advanced work 
Introductory instructor 
A and B D and E 
K 59.39 9.03 
L 56.89 9.57 
M 59.63 7.79 
N 51.48 12.43 
O 60.89 11.81 
P 60.93 14.42 
Average for department 58.20 10.84 











Stated in another way, Table VI shows the same situation in terms 
of the percentage of advanced grades obtained by students who had 
been enrolled in the various introductory sections, and compared to 
the proportion of advanced students prepared in those sections. 
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The percentages in columns two and three of Table VI are 
significant when compared to those in column four in that they show 
whether each instructor’s students obtained more or fewer above- 


average and below-average grades than would be expected on the basis 
of their numbers in advanced courses. 


TaBLE VI.—PERCENTAGE OF BETTER-THAN-AVERAGE AND 
BELOW-AVERAGE GRADES OBTAINED BY ADVANCED STUDENTS 











Grades in advanced work Percentage of ad- 
Introductory vanced enrollment 
instructor A and B D and E supplied 
K 11.11 9.02 10.94 
L 16.74 15.04 17.20 
M 18.28 12.78 17.94 
N 12.23 15.78 13.91 
O 23.31 24.06 22.30 
P 18.42 23.31 17.69 














These tables show significant differences among the instructors 
with respect to the quality of work done by their students after they 
enrolled in advanced courses. In other words, students in certain 
introductory sections were more likely than those in others to make 
high grades in advanced courses in psychology. That intelligence 
alone is no adequate explanation of this situation is suggested by the 
rank order coefficient of correlation between the average intelligence 
decile of the members of the introductory sections who took advanced 
courses and their standing in those courses, which was R = .20 + .26. 
Even though the number of cases was too small to permit valid con- 
clusions, there appears to be a clear indication that these differences 
cannot be attributed solely to intellectual discrepancies. 

A composite picture of the situation may be obtained by combining 
the measures discussed above*in tabular form, after equalizing the 
numbers of students in each introductory section. This has been done 
in Table VII. 

The information obtained by means of this method may be vari- 
ously interpreted, depending primarily on the aim and the prevailing 
point of view of the department concerned. If the number of advanced 
students is considered most important, instructors M and P would be 
considered superior, and K distinctly inferior. If the emphasis is on 
the number of majors (each advanced student taking as many courses 
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as possible), O and P would be most successful, K and WN least. 
Instructors O and P excelled in the proportion of their students 
making A and B grades in advanced work, while N was decidedly 
below average in this respect. On the other hand, O and P also 
excelled in the proportion of their students making D and E grades, 
Finally, if the quality of beginning students who continue their work 
in psychology is taken into consideration, more of P’s superior students 
were in this group than any other instructor’s, but this is also true of 
P’s average students. Instructor K ranked last in respect to superior 
as well as average students taking advanced work. From this, it 
appears possible to conclude that, if teaching methods are responsible 
for the proportion of students continuing their work in the same sub- 
ject, such methods motivate above-average and average students 
alike. 


TaBLeE VII.—Tue Recorps or StupENTs in INTRODUCTORY CoURSES 
(Based on One Hundred in Each Section) 





Introductory instructor 
Aver- 


K|L|M|N|0O| P| *® 





Item 





A and B students taking advanced work...| 10 | 15 | 138 | 12} 8| 13) 12 
C and D students taking advanced work...| 7 | 17 | 22 | 15) 21 | 21 17 


Total number taking advanced work....... 17 | 32 | 35 | 27 | 29 | 34] 29 
Number of advanced courses taken........ 40 | 43 | 55 | 37 | 96 | 70 | 54 
A and B grades in advanced courses....... 24 | 25 | 32/| 19} 59 | 43); 31 
D and E grades in advanced courses....... 4; 4} 4] 5]11] 10 6 


























Whatever interpretation be used, it is obvious that considerable 
individual differences existed among the six instructors with respect 
to the quantity and quality of advanced work done by students pre- 
pared in their introductory sections. It is readily admitted that many 
extraneous factors account in part for these differences, but student 
intelligence and course requirements appear not to be the most impor- 
tant of these. However, the present study is not concerned with 
possible explanations, but merely with an attempt to determine the 
effectiveness of teaching introductory courses. When used judiciously 
and interpreted cautiously, the technique employed here may be useful 
as an aid in judging the effectiveness of teaching attained by individual 
instructors on a more objective and factual basis than that afforded by 
the conventional methods. 
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RETROACTIVE INHIBITION AND COMMON SENSE 


STEUART HENDERSON BRITT* 
The George Washington University 


In 1935 the present writer reviewed all the experimental and 
theoretical literature which had been published through December, 
1934, on retroactive inhibition.(1) He has since become impressed 
with the possibility that a large number of the conclusions from the 
experiments were in close agreement with what one would expect on a 
common sense basis. This does not imply that the precise quantita- 
tive relations between the various determining conditions of retroactive 
inhibition (similarity, temporal position, time interval, etc.) and the 
amount of retroactive inhibition could be determined in advance. It 
does imply that many of the general relations between the determining 
conditions and the amount of retroactive inhibition may be anticipated 
by naive persons, that is, persons without formal training in psy- 
chology.{ If this should prove true, the results might be useful for 
the guidance of instructors in teaching experimental and educational 
psychology. 

Accordingly, the objectives of the present study were three-fold: 

(1) To determine the extent to which investigators of retroactive 
inhibition have demonstrated qualitative relations which are reason- 
ably apparent to naive, intelligent persons on a common sense basis. 

(2) To find the correlation between general intelligence, and ability 
to decide on a common sense basis the fundamental qualitative rela- 
tions of those variables which affect retroactive inhibition. 

(3) To find the correlation between general ability in psychology, 
and ability to decide on a common sense basis the fundamental qualita- 
tive relations of those variables which affect retroactive inhibition. 


I, PROCEDURE 


A summary of conclusions based on experimental data was given 
in the 1935 review (1, pp. 423-427). Twenty of these conclusions were 
selected for study in the present investigation. 





* The writer is indebted to Professor Frank M. Weida for valuable critical 
suggestions. 

t Throughout this paper the term common sense is used to mean, ‘judgments 
and conclusions based upon past experience.” 


{The term naive is used throughout to mean, “without formal training in 
psychology.” 
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After preliminary investigations with naive persons, some of the 
twenty conclusions were reworded so as to make them more easily 
understood. The twenty conclusions were then presented in multiple 
choice form, with two possible choices. As an example, one of the 
conclusions of the original review was: 


With lists of adjectives for learning material, no consistent variation in 
amount of retroaction with time interval has been demonstrated. 


This statement was reworded and presented in multiple choice form 
in the present study as: 


With lists of adjectives for learning material, there is (a, no) consistent 
variation in amount of retroactive inhibition as the time interval between 
original learning and relearning increases. 


The order of precedence of the two choices in each of the twenty 
statements was: affirmative before negative, increase before decrease, 
more before less. The following test was given in a three-page 
mimeographed form to one hundred seventeen students in the writer’s 
two sections of Elementary Psychology in October, 1936: 








Name Age Sex Year in College___ 
Have you ever had any courses in Psychology or in Education? _If 
the answer is ‘‘ Yes,” give the names of the courses here: 








Retroactive Inhibition 


Read and study this page carefully until you are sure that you understand i. 
Here are two imaginary situations: 

Situation 1—Suppose that you learn one activity thoroughly, and then 
afterward learn another activity thoroughly. Now, suppose that you try to 
relearn the first (original) activity. 

Situation 2.—Suppose that you learn thoroughly the same original activity 
as in Situation 1—you are learning this for the first time, just as in Situation 1. 
Then, however, instead of learning another (interpolated) activity, you rest or 
engage in any usual type of activities for a period of time equal to that which 
you would have required to learn the interpolated activity. Now, suppose 
that you try to relearn the first (original) activity. 

You will probably not be able to relearn the original activity as readily 
in Situation 1 as in Situation 2. The learning of the interpolated activity 
has resulted in retroactive inhibition; that is, it has interfered with the retention 
of the original activity. 

Retroactive inhibition, then, may be defined as the detrimental influence of 
an interpolated activity (that is, a second activity) upon the retention of an 
original activity. NOTE particularly that it is the retention of the original 
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activity, and not the original learning of this activity, which is involved in the 
concept of retroactive inhibition. As a hypothetical case, if you study French 
(original activity) for an hour and thereafter turn to an hour’s study of 
Italian (interpolated activity), your ability to recall the French (original 
activity) will probably be less than it would have been had you substituted an 
hour’s interval of no learning activity in place of the hour’s study of Italian. 


Diagram of Situations 1 and 2 
(Situation 1): Learn first (original) ac- Learn another (interpo- Relearn 


tivity—e.g., French. lated) activity—e.g., original ee 
Italian. activity 1 44 
(French). ; ) 


(Situation 2): Learn first (original) Rest, or engage in any Relearn 
activity—e.g., French— usual type of activities original 
this means that you are foraperiodoftimeequal activity 


learning this for the first tothat which you would (French). eh 
time, just as in Situa- have required to learn ip] 
tion 1. the interpolated activity 4 

(Italian). ff 


Read and reread the previous page until you are sure that you understand 
it. In other words, be sure that you understand what is meant by retroactive 
inhibition BEFORE you attempt to fill in the answers below. This is 
important. ' 

Now, there are two choices open to you: ~ 

Choice 1.—In case you are not absolutely sure that you understand the | 
meaning of the term ‘‘retroactive inhibition,’ make no attempt to fill in the . 

' 
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answers below, but simply turn in your paper. This will not count against 
you. 

Choice 2.—If you do understand the meaning of the term “retroactive 
inhibition,” read carefully each statement below and underscore the word or oh 
words in parenthesis in each statement which you consider correct. Mark 
every statement. You may have all the time that you wish. 

Underscore the word or words_in parenthesis in each statement which you 
consider correct: 

1. The degree of similarity between the original activity and the inter- 
polated activity (is, is not) important in determining the amount of retroactive | 
inhibition. .. 4 

2. Retroactive inhibition (can, can not) occur when the interpolated Wee 
activity is introduced immediately after original learning. 

3. Retroactive inhibition (can, can not) occur when the interpolated 
activity is introduced several weeks after original learning. 

4, With lists of adjectives for learning material, there is (a, no) consistent 
variation in amount of retroactive inhibition as the time interval between bi 
original learning and relearning increases. 
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5. Susceptibility to retroactive inhibition tends to (increase, decrease) ag 
the amount of material for the original activity is increased. 

6. In general, the greater the degree of learning of the original activity, 
the (more, less) susceptible is the learning to retroactive inhibition. 

7. The amount of retroactive inhibition tends to (increase, decrease) with 
the degree of learning of the interpolated activity, although not proportion- 
ately, provided the interpolated learning is not too great. 

8. Practice or previous experience with the original learning material or 
with the experimental conditions under which retroactive inhibition is being 
measured is probably associated with (more, less) susceptibility to retroactive 
inhibition. 

9. When a period of four hours or of eight hours of sleep, rather than an 
equal period of waking, intervenes between learning and recall of nonsense 
syllables, the differences in amounts of retention favor (waking over sleep, 
sleep over waking). 

10. Original learning in a waking state and interpolated learning in a 
hypnotic trance, or original learning in a hypnotic trance and interpolated 
learning in a waking state, probably result in (more, less) interference in 
recall than when both original and interpolated learning are in the same state. 

11. When both original learning and interpolated activity have occurred 
during waking, hypnosis for the retention test (affects, does not affect) the 
amount of retroactive inhibition materially. 

12. Attempts to decrease susceptibility to retroactive inhibition by the 
direct suggestion that the original material will be remembered well, and to 
increase susceptibility by the direct suggestion that it will not be remembered 
well, probably (have, have not) proved successful. 

13. Pleasant original learning maferial is (more, less) subject to retroactive 
inhibition by indifferent interpolated material than is either indifferent or 
unpleasant original learning material. 

14. As compared with normal learning, electric shock during original 
learning apparently (increases, decreases) the amount of retroactive inhibition. 

15. As compared with normal learning, electric shock during interpolated 
learning apparently (increases, decreases) the amount of retroactive inhibition. 

16. Individual differences in susceptibility to retroactive inhibition (exist, 
do not exist) among adults. 

17. Consistent differences in susceptibility to retroactive inhibition (exist, 
do not exist) between the two sexes. 

18. There (are, are no) differences in susceptibility of school children to 
retroactive inhibition according to age. 

19. Retroactive inhibition (occurs, does not occur) in animals other than 


man. 
20. Retroactive inhibition (can, can not) occur when the retention of the 
original activity is measured by the method of recognition, rather than by the 
method of relearning or recall. 
Please note here the number of minutes spent on this questionnaire 
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The experimenter emphasized the fact that the completion of this 
test had absolutely nothing to do with a student’s grade in the course; 
and also that if a student was not absolutely sure that he understood 
the meaning of the term ‘“‘retroactive inhibition,” he could adopt 
Choice 1 and simply turn in his paper. 

The test was conducted at the seventh lecture session in Elementary 
Psychology, and the topics of learning and memory were not assigned 
or discussed until the thirty-first session. The papers were eliminated 
of twelve students who indicated that they had previously had courses 
in psychology or in education, and of six students who adopted Choice 1. 
Thus, the experimental group was reduced to ninety-nine subjects who 
could be considered naive as to the concept of retroactive inhibition. 

The ninety-nine papers were scored as to number of correct answers, 
and later both intelligence test scores and psychology grades were 
recorded. The intelligence test scores were based on the 1936 edition 
of the American Council on Education Psychological Examination for 
Freshmen. The psychology grades were the numerical grades in 
Elementary Psychology at the completion of the course; the instructor 
made out these final grades without knowledge of the scores on the 
retroactive inhibition study. 

Coefficients of correlation were then determined between the scores 
on the retroaction study and intelligence test scores; between scores 
on the retroaction study and psychology grades; and between intelli- 
gence test scores and psychology grades. 


II, RESULTS 


If the subjects had marked the retroactive inhibition test on purely 
a chance basis, we should expect that of the twenty statements, 
approximately ten would be marked correctly and approximately ten 
would be marked incorrectly. Actually, the mean number of answers 
marked incorrectly by the ninety-nine subjects was 7.40 (SD = 1.92). 

Since there were ninety-nine subjects, there were ninety-nine 
answers to each of the twenty items. On the basis of pure chance, we 
should expect half of these answers, or 49.5, to be marked correctly 
and the other half to be marked incorrectly. Only five of the state- 
ments—4, 5, 11, 14, and 18—were marked incorrectly by over half 
of the subjects. The other fifteen statements were marked correctly 
by over half of the subjects. Table I shows the correct answers for 
each of the twenty items, and also the extent to which the numerical 
value of the difference between the values obtained and the values 
designated by chance may be considered reliable. 
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TaBLE I.—Correct ANSWERS 
(Ninety-nine Subjects) 




















Numerical Critical 
No. of Ss value of ; 
No. Correct answer answering difference —_ 
(X — 49.5) 
correctly (X) | from chance + 4.971 
(X — 49.5) ; 

1 jis 76 26.5 5.33 
2 | can 94 44.5 8.95 
3 | can 67 17.5 3.52 
4 |no 34 —15.5 —3.12, 
5 | decrease 31 —18.5 —3.72 
6 | less 64 14.5 2.92 
7 | increase 78 28.5 5.73 
8 | less 89 39.5 7.95 
9 | sleep over waking 61 11.5 2.31 
10 | more 60 10.5 2.11 
11 | does not affect 40 —9.5 —1.91 
12 | have not 54 4.5 0.91 
13 | less 88 38.5 7.75 
14 | decreases 25 —24.5 —4.93 
15 | decreases 57 7.5 1.51 
16 | exist 96 46.5 9.36 
17 | do not exist 71 21.5 4.33 
18 | are not? 19 —30.5 —6.14 
19 | occurs 87 37.5 7.55 
20 | can 59 9.5 1.91 








1 That SD = 4.97 was arrived at by the formula, SD = ./N X P X Q, where 
P and Q represent the fractions of right answers and of wrong answers, respectively, 
which could be expected on the basis of chance. Thus, SD = 4/99 x 4 X & = 
4.97. 

2 The correctness of this answer may be questioned since Lahey’s(3) recent study 
in which she concludes: “‘ With brightness held constant, susceptibility to retro- 
action decreases with increasing chronological and/or mental age . . . ” (p. 87). 





The r between number of correct answers and intelligence test 
scores* was +.19 (PE = .07). The r between number of correct 
answers and psychology gradest was +.17 (PE = .07).{ The coeffi- 
cient of reliability of the test instrument (the first half of the test 
correlated with the second half) was —.02 (PE = .07).§ 





* The Mean of the intelligence test scores was 207.06 (SD = 2.73). 

+ The Mean of the psychology grades was 72.49 (SD = 2.39). 

t The r between intelligence test scores and psychology grades was +.57 
(PE = .05). 

§ The “‘split-half technique” is described in Garrett, H. E.: Statistics in Psy- 
chology and Education, 2d ed., Longmans, Green, 1937, pp. 318-319. 
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Ill, SUMMARY 


A three-page mimeographed form was presented to one hundred 
seventeen students in Elementary Psychology at the beginning of the 
course. This form contained an explanation of retroactive inhibition, 
and twenty experimentally verified conclusions about retroactive 
inhibition in multiple choice form, with two possible choices. The 
one hundred seventeen subjects were reduced to ninety-nine by the 
elimination of twelve who had previously had courses in psychology 
or in education, and of six who did not turn in answers. 

Intelligence test scores for the ninety-nine subjects were based on 
the 1936 edition of the American Council on Education Psychological 
Examination for Freshmen. Final psychology grades were the numeri- 
cal grades at the completion of the course in Elementary Psychology. 

The mean number of answers marked incorrectly was 7.40 
(SD = 1.92). Fifteen of the twenty statements were marked correctly 
by over half of the subjects. Statistically, the results were of high 
reliability. 

An r of +.19 showed no great degree of linear relation between 
ability to make common sense judgments about retroactive inhibition 
and general intelligence, as measured by intelligence tests. Likewise 
an r of +.17 gave no indication of any significant degree of linear 
relation between such common sense judgments about retroactive 
inhibition and general ability in psychology, as represented by grades 
in Elementary Psychology. The shortness and unreliability of the 
test instrument helps to explain these low correlations. 

The results indicate that naive subjects of at least freshman rank 
in college have the ability to make rather accurate judgments as to 
many of the more general determining conditions of retroactive inhibi- 
tion.* This conclusion may be of significance to instructors in teach- 
ing experimental and educational psychology. 


a“ 
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* This does not mean that naive subjects understand the implications of these 
determining conditions, for example, the theories of retroactive inhibition. (2) 
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THE PROCEDURE OF MATCHED CASES—A CAUTION 


HORACE ENGLISH 


Ohio State University 


' If we desire to discover the relationship of variable X, (the inde- 
pendent variable) to variable Y (the dependent variable), it is clear 
that we must somehow eliminate from our consideration any con- 
current effects on Y of variables Xo, X3, X,... Xn. Otherwise the 
obtained values of Y may reflect X2, X3, X4, etc., instead of X;. “The 
procedure of matched or equated cases or groups is an attempt thus to 
eliminate X2, X3, X4... X,. The logic relied on is this: If all our 
comparisons relate to cases in which Xe, X3, X4... X, are held 
constant, then observed changes in Y are due to the observed or 
manipulated changes in X;. 

The procedure is well-illustrated in a recent study of visual defects 
(the independent variable, X,) as related to reading (the dependent 
variable, Y). In this study the pupils were matched in pairs for 
intelligence (Xz), school progress (X3), and chronological age (X,). 
They were matched thus on the valid assumption that X2, X3, and X, 
may very possibly influence reading achievement and that their 
influence should, therefore, be held constant. There is latent, how- 
ever, another assumption, one which is not by any means so obviously 
valid; namely, that variable X, is unrelated to X2, X3 and X,; that is, 
that visual defect is unrelated to intelligence, school progress, or CA. 
As a matter of fact, visual defect is probably correlated with all three. 

And it is this which renders intelligible the at first rather astonishing 
conclusion of the investigation, which was that visual defect does not 
have a detrimental effect on reading achievement. 

Let us consider only the case of intelligence. Suppose, as is 
legitimate and indeed plausible, that intelligence is adversely affected 
by poor eyesight. ‘Then the group of handicapped children who have 
“normal”’ intelligence for age despite the handicap are not a random 
group to be fairly compared with another group of the same intelli- 
gence without the handicap. They are, instead, highly selected not 
only for defective vision, X:, but also for some other variable, Xz, let 
us say perseverance or courage. In other words, since they have 
“‘what it takes” to surmount their visual handicap in developing 
intelligence, is it so surprising to find that they have also surmounted 
it as regards reading? 
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To summarize: In the procedure of matched cases, there is grave 
danger that by “holding constant”’ a certain number of variables, we 
shall illicitly introduce the effect of other variables. Put another 
way around, the procedure assumes that the independent variable is 
not correlated with the variables according to which the cases are 
equated. This assumption is generally not examined, and in current 
experimentation it is often flagrantly invalid. 

There is no easy remedy. We could fall back on the method of 
partial correlation, but the most ardent advocates of statistical analysis 
generally concede its inferiority to a properly conceived experimental 
analysis. (Of course, the implied dichotomy of statistics vs. experi- 
ment is misleading; these represent merely limiting cases of a single 
analytic procedure.) And in this case, unless we are very careful, 
we shall make essentially the same logical error. If we “‘hold con- 
stant” statistically the effects of X2, X3 and X,, we shall almost cer- 
tainly remove from the variance of Y part that is due to X;. In 
short, we shall partial out too much.'! The writer believes, personally, 
in a cautious employment of the “good old law of averages.” He 
would suggest, to continue with the problem just discussed, that we 
study the reading of very many children with defective vision—of 
such a number that intelligence, except insofar as it depends on vision, 
may safely be assumed to be normally distributed. If, then, we find 
that visual handicap restricts intelligence and that that in turn 
adversely affects reading skill, it is legitimate and indeed necessary to 
accept this derivative effect (in addition to any direct influence) 
as a genuine result of the visual limitations. 

One point more. The foregoing discussion has been pitched at that 
level of scientific thinking where the effect of one variable, all others 
being considered equal, is the desideratum. But in all branches of 
biology it is becoming daily apparent that we must rise above this 
stage. We must learn to consider what happens to the relation 
between two variables in a field of other freely varying variables. 
An emphasis upon this type of relation is among the chief contributions 
of Gestalt to contemporary thinking. Going back once more to our 
case of visual handicap, do we as psychologists seriously think that a 
given degree of visual defect imposes upon the individual a constant 
proportional handicap? Have we never heard of compensatory 





’ Compare on this point the argument of Barbara Burks. 
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responses? Must we not rather hold that, under certain circumstances 
and for certain purposes, a visual defect may not be a handicap at all 
but a positive asset? We have long recognized such complex relations 
in clinical practice; it is time that we take seriously the need to develop 
scientific procedures which do justice to these facts. But that, as our 
grandmothers always tell us, is another story. 
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EVALUATION OF TEST ITEMS BY THE METHOD OF 
ANALYSIS OF VARIANCE* 


JOSEPH LEV 
New York City 


I 


A procedure will here be outlined which will serve to validate and 
at the same time weight test items in objective tests. The test items 
considered are of the multiple choice type in which the responses are 
graded to show degree of reaction. All the items contain the same 
number, k, of responses and are graded in the same way. This includes 
for k = 2 the dichotomous case in which items are marked right or 
wrong. 

The evaluation of the items is based on the results of giving the 
test to a group of subjects. If criterion scores independent of the test 
are available for these subjects, they may be used in the method to be 
described. If these are not available, criterion scores may be obtained 
from the test which is being studied by assigning values to the totality 
of responses of each subject based upon weights arbitrarily assigned. 
A suitable system of weights for the responses 1, 2, to kis 0,1, tok — 1, 
respectively, the same weights for each item. In the case of items 
presenting a choice between two responses the wrong is weighted zero 
and the right, one. 

This method of scoring must be considered as only preliminary in 
nature. A method of obtaining more suitable weights will be described 
later. 

Criterion scores having been obtained, the validity of items may 
be judged in terms of their consistency with these scores. The choice 
of response should in general correspond to the value of the criterion 
score of the subject making the choice. For convenience the responses 
will be supposed graded in increasing order from 1 tok. Hence for a 
valid item, the subjects choosing the first response should have the 
lowest criterion scores, the subjects choosing the second response the 
next to lowest criterion scores, and so on to those choosing the kth 
response who should have the highest scores. 





* Acknowledgment is made to the United States Works Progress Administration 


for the City of New York for assistance rendered under Project Number 
465-97-3-102. 
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The validity of an item is then determined in terms of its power to 
discriminate between the groups just described. For the subjects upon 
whose responses the test is validated this power is easily determined, 
It is desirable, however, to ascertain whether the discriminatory power 
will hold true for other samples. This involves a test of statistical 
significance, which will be described below. 

The question of discriminatory power presents additional diffi- 
culties when the items have more than two responses. An item may, 
for example, be very successful in discriminating between groups 
choosing responses 1 and 2, but not at all successful in the case of 
groups choosing 2 and 3. A method of dealing with this problem will 
be proposed. This method will yield a set of weights for the responses, 

The mathematical basis for dealing with these problems and 
methods of calculation will be described. 


II 


The comparison of groups of criterion scores can best be carried out 
in terms of their means. Suppose that N individuals have taken the 
entire test. Because of omissions, the number answering a particular 
item, say the pth, may be less than N. Call this number N,. These 
individuals distribute themselves into k groups according to the choice 
of response. Denote the number choosing the first response by N,, 
the number choosing the second by Ne, and so on to the number choos- 
ing the kth, which is denoted by Ny. These values satisfy the relation 


NitNit+---+M=N, 


The total sum of the criterion scores for each group may then be 
obtained and denoted by 7,1, T2, to Ty, corresponding to the choices of 
response. Finally the means M, to M;, are obtained. 

The first criterion for validity of items may now be stated in precise 
mathematical form. The means must range in increasing value in the 
order M,, Mz to My. Some deviations from this rule may be per- 
mitted. If, for example, M; is only slightly greater than M2, but M; 
is much greater than M,, the item is still useful and may be accepted 
for further consideration. The order of mean values may suggest the 
appropriate response order. 

An item satisfying the criterion just described has shown discrim- 
inatory power of the appropriate kind for the sample. The question of 
discriminatory power for the population will be considered in the next 
section. 
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III 


The means having been found to differ in the sample, it is necessary 
to determine whether this difference may have arisen as a result of 
random sampling. This leads to the statistical test of significance 
known as the analysis of variance as described by R. A. Fisher! and 
G. W. Snedecor.’ 

The method consists of the comparison of two variances (squared 
standard deviations). 

The first variance to be considered is that between the means of 
the groups. For its formulation, the additional notation M,, to 
stand for the means score of all individuals answering the pth item, is 
needed. Then the variance between means, V3, is given by 


y, = Nia — My)? + Na(Ms — My)? + «>» + Ni(Ms — My)? 
e k—1 





The variance between means shows the extent to which the means 
differ among themselves. 

To establish a standard of significance for this variance, it is com- 
pared to another known as the variance within groups. The latter 
arises from the deviations of the individual scores from their respective 
group means. Denoting generally the individual scores by X, the 
deviations in the first group are X — M,, in the second X — Mg, and 
soon. The variance within groups, V., then is 


V _ 2(X — M,)? + (xX — M.)? + (teat th + =(X — M,)? 
o™= N,—k : ’ 





where each sum is taken for all scores in the corresponding group. 
The ratio between these variances 


eo 


A. oe 
may be used in estimating the discriminatory power of an item. F 
has a known distribution. For each value of F the probability can be 
determined that the sample at hand should have arisen as a result of 
random sampling. If this probability is sufficiently low, considerable 





‘Fisher, R. A.: Statistical Methods for Research Workers. London: Oliver, 
1936. 


. Snedecor, G. W.: Calculation and Interpretation of Analysis of Variance and 
Covariance. Ames, Iowa: Collegiate Press, Inc., 1934. 
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assurance is given that the variation found among the means is due to 
reasons other than random sampling. The result is then taken to be 
important enough to be used as a reason for continuing the use of the 
item in the test. 

The probability which is to be used as a standard of acceptance for 
items is arbitrary. Probability values usually taken as criteria are .05 
and .01. Values of F for each of these probabilities corresponding to 
different values of k and N are given on page 88 of the book by Snedecor 
previously mentioned. In this table criterion values of F appropriate 
to the sample at hand are found in the (k-1)th column and (N-k)th 
row, two values at each point. The lesser value corresponds to a 
probability of .05 and the greater to a probability of .01. Since the 
probabilities decrease as F increases, the values of F show the relative 
discriminatory power of the items in the test. 

For k = 2, the value of F is the square of Student’s Ratio discussed 
in the book by Fisher, mentioned above. This ratio is a form of the 
familiar critical ratio, used to test the significance of the difference 
between means. The method of analysis of variance is very useful in 
the case k = 2, since it presents a particularly simple method of calcu- 
lation of the critical ratios. 

For the actual calculation of F a procedure will be described which 
is much simpler than the direct application of the formulas above. 

Certain constants useful for all items may be calculated at the 
beginning. These are 7 = 2X, the total of criterion scores of all 
individuals taking the test; and S = 2X? the total of squares of all 
criterion scores. 

For items answered by all subjects, these constants may be used 
directly. For items omitted by some subjects, the values correspond- 
ing to the missing cases must be deducted. In either case, the totals 
just described for a given item, say the pth, may be denoted by 7, 
and S,. 

Combining this notation with that previously described, the 
required variances may be calculated by means of the formulas 


T,M, + T2M.2 + -o. > Ti. Mi ow TM» 


Y= k—1 





and 
Sp mts (T,M,; + tomar: + T.Mi) 
N—k 





Vv. = 


F is the ratio Vs/Vu. 
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IV 


Items yielding sufficiently high values of F to be acceptable, require 
further discussion. 

The simplest case is the dichotomous, k = 2. Here it is more 
convenient to work with the Student’s Ratio 


t= VF 


For each item there is computed a value of ¢ and items are accept- 
able only if their value of ¢ is greater than some standard value bh. 

The rejection of an item based upon ¢ being between 0 and &, may 
be viewed as a procedure in scoring. The rejected item is scored zero 
for both right and wrong answers and hence does not contribute to the 
total score. In other words, both right and wrong answers are 
assigned the same weight, zero. 

The scoring for items having values of ¢ between 0 and &, having 
been agreed upon, it is still necessary to decide upon a scoring system 
for items having values of ¢ greater than &. The following procedure 
is suggested as an extension of the logic used for rejection of items. 
Assign the weight zero to the wrong response and the weight one to the 
right response if ¢ is between f and 2t, two if ¢ is between 2% and 3h, 
and so on. 

A similar logic may be used in the consideration of items which 
provide a choice among more than two responses. Consider first 
the case of three responses. There are two mean differences, the 
significance of which is to be considered. These are Mz — M,, and 
M;— Mz. There are, ‘uerefore, two values of ¢ 


, — Ma — Mi a Ma — Ms 
;= ._= 


gear wf ead 
Werln. + Ven) N; 


These values must exceed some standard for significance. The 
difference M; — M, need not be considered. 

The following procedure for weighting is suggested. Let the middle 
response be weighted zero. If ¢; and te are both less than f responses 
1 and 3 are also weighted zero and the item is rejected. This will 
usually not be true for items which have satisfied the F criterion. It 
may, however, be true that one of the ?’s is less than f and the other is 
not. Even if both ¢’s are greater than t they are likely to be quite 
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different. This suggests the separate determination of weights for 
responses 1 and 3 depending on the values of ¢; and f2, respectively. 
The weights for the first response are 0 for ¢; between zero and t, —] 
for t; between f& and 2, —2 for 2to < t: < 3%, and so on. Likewise 
weights of 0, 1, 2, and so on are assigned for corresponding values of t.. 

The generalization of the procedure to the case of k greater than 3 
is obvious. All the k—1 differences M:— M:, Ms — Mz, to 
M;, — M,-, are tested for significance by means of the ratios {,, f,, 
to, t-1. One of the middle responses is weighted zero. All the dif- 
ferences are assigned weights as shown above. The differences to the 
right of the zero get positive weights and those to the left negative. 
The weight of any response is the sum of the weights between it and the 
zero response. If k is 5, for instance, the first response is weighted 
negatively, its weight being determined by the sum of weights arising 
from ¢, and ff. The weight of the fifth response is positive and is 
obtained from the sum of weights arising from 4 and?s. If N = 4let 
the second or third be taken to be zero. 


v 


The procedures just described will be applied to the items in a test 
of Worries developed at Teachers College. The items require a choice 
among the three responses—often, sometimes, never. There are fifty- 
three items in the test. The test was given to one hundred twenty-five 
girls. Total scores were obtained for these subjects by assigning the 
weights 1 for often, 2 for sometimes, and 3 for never. 

The constants valid for all items were 


T = 2X = 14,139, 
S = DX? = 1,655,775. 


For each item the values necessary in the computation were tabu- 
lated as follows: 
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For item 18 these values are 

















T N M 
PR ere reer ere ere eee ee 2,605 27 96.481 
CE, iv evisse cnet epadgeedass 5,340 49 108 . 980 
DO cadres as eh ag toeaednetaess 6,194 49 126.408 
All responses on item................ 14,139 125 113.11 








From the last column it is clear that the items range in increasing 
value from M, to M;s0 that the criterion of Section II is satisfied. It 
is, therefore, necessary to go on to find the value of F. Since all sub- 
jects have answered the item, the original constants can be used 
without modification. Hence 


S» = 1,655,775 


From these values 


Vi = 8,498 
Vu = 324 
 *F = 262 


From Snedecor’s tables, it may be seen that for one hundred 
twenty-five cases and three responses, a value of F greater than 4.82 
arises from random sampling only once in a hundred times. Hence 
the value of F just found shows high discrimination. 

The next step is to find suitable weights. The significance of the 
differences M, — M, and M; — Mz is given by the two ratios 


ieee 12.5 
l —- 
18+/367 + 49 





= 2.9, 





and ‘ 
17.4 


- = 48 
18-/ 49 + 49 








Using as a standard t = 2 both differences are seen to be signifi- | 


cant. The weight for the ‘‘sometimes”’ response is taken to be zero. 
The three weights may be assigned as 
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‘4. For item 22 of the same test the results follow: 
Hh r N M 
a1 — Ff 
6 ia ds sn ndnigclnd sub Cubwarieed 6,233 60 103.88 
r7 TERETE ea OT RD 4,576 38 120.42 j 
: eh gS ce we 3,140 25 125.60 : 
4 All responses on item................ 13 ,949 123 113.41 
: | ‘ , of 
a The means M, to M; are in the proper order of value. To findF fu 
it should be noticed that the item was omitted by two subjects. The — in 
scores of these subjects are 94 and 96. Hence be 
S, = DX? = 1,655,775 — (94)? — (96)? = 1,637,723. - 
From these values tc 
Vs = 5,499 7 
Vw = 373 
= N 
; F=15 d 
A The value of F shows considerable discrimination. Hence it is u 
Bic desirable to compute ¢; and ¢, and we find these to be 4.1 and 1. a! 
: : The weights follow readily. The criterion for ¢ is taken to be st 
: t) = 2 as before. Hence there is discrimination only between often 
* and sometimes. The weights are: ce 
i 8 
ee Often —2; Sometimes 0; Never 0. 1 
In a similar manner all items in the test may be examined and W 
weights established for them. € 
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A NOTE ON THE RELIABILITY OF THE KNAUBER ART 
VOCABULARY TEST! 


JOSEPH E. MOORE 


George Peabody College for Teachers 


Testing in the field of art has left much to be desired. The failure 
of testing can be attributed to several causes: First, the function or 
functions to be measured are very complex, and, second, the measuring 
instruments have, in several cases, been poorly constructed. It can 
be said to the credit of the authors of certain art tests that they have 
realized the limitations and weaknesses of their measuring devices and 
have been eager to improve them. Satisfactory tests in art depend 
to a large extent upon those who apply them and how adequately they 
report their findings. 

The day of uncritical acceptance of any test is rapidly passing. 
New tests will undoubtedly be studied and critically evaluated to 
determine their exact merits. The validity and reliability of meas- 
uring instruments will have to be demonstrated again and again before 
art teachers and supervisors will accept them as dependable aids in 
studying art students and their achievement. 

The Knauber Art Vocabulary Test with which this study is con- 
cerned is comparatively new. The author of the test does not give 
sufficient data for a comprehensive evaluation of the instrument. 
There is no information telling what sources were used to get the 
words for the vocabulary test. No data were given concerning the 
exact method or methods used in sampling the sources for word count. 
or word appropriateness. 


Lacking information concerning the basis for the Knauber test 
and the method of developing the test, one is not sure what phase of 


art the vocabulary represents. The writer has assumed that the 
Knauber Art Vocabulary covers the terminology found in the field of 
art. This study attempts to throw some light on the reliability of 
the Knauber Art Vocabulary Test. 





‘The writer wishes to express his appreciation for the whole-hearted codpera- 
tion during the course of this study of Professors George S. Dutch and Grace 
Sabotka, and Mr. Frederic P. Giles of the Art Department of George Peabody 
College for Teachers. 
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SUBJECTS 


There were in all one hundred fifty-eight! subjects divided as 
follows: Twenty-three art majors, twenty-one art teachers, seven art 
minors, and ninty-eight educational psychology students who had 
never had any art other than a general art appreciation course. These 
subjects were teachers and students at George Peabody College for 
Teachers. 

PROCEDURE 


All these groups were given the Knauber Art Vocabulary Test 
during the summer and fall quarters of 1937. 


RESULTS 


The reliability of the Knauber test, by correlating the scores on 
the odd questions against the scores on the even questions for one 
hundred fifty-eight subjects was found to be .81 + .02.2. The reli- 
ability coefficient by the Spearman-Brown formula for the odd-even 
method was .90. One further check on the reliability of the test was 
made by correlating the scores made on the first and last quarter of 
the test with those made on the two middle quarters. The coefficient 
of correlation was .89 + .01 for the two halves of the test and .94 when 
corrected by the Spearman-Brown formula. 

The reliability of a test is obtained to a large extent by shifting 
questions around so that every other one is of about equal difficulty. 
With this fact in mind other measures to evaluate the vocabulary test 
were tried out. The data on the other phases of the tests’ performance 
are presented in the Table I. 

The art majors earned slightly higher scores on the art vocabu- 
lary test than did the art teachers, but the .55 of a point difference 
between the means was not statistically significant. The art majors 
were at least variable in their responses, as can be seen by comparing 
the two standard deviations of 14.05 for the majors and 15.18 for 
the teachers. It is highly probable that selective factors might have 
made for a more homogeneous group among the majors and a less 
homogeneous group among the teachers. 

The art minors, with a standard deviation of 16.65, were much 
more variable than either the teachers, the majors, or the non-art 





1 Nine of the subjects used in the section of the study dealing with reliability 
were not used in subsequent studies because their other records were incomplete. 
2 A step interval of two was used in obtaining these coefficients. 
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subjects. The small number of subjects undoubtedly greatly affected 
the standard deviation of the art minor group. In contrast with that 
of the art minors the standard deviation for the non-art subjects is 
smaller, being 13.65. \ 


TapLe I1.—Tue Mepiam, Megan, STanparp DeviaTIon, STANDARD ERROR OF THE 
MEAN, AND THE RANGE oF ScorREs ON THE ART VOCABULARY TEST FOR 
Four Dirrerent Groups or Sussects! 





vo. Median} Mean SD ~ Range 














subject Mean 
re 23 61.25 | 63.70| 14.05} 3.60 33-88 
Art teachers.......... 21 59.17 | 63.15 15.18 | 3.30 38-92 
ee 7 52.50 | 49.65 | 16.65 | 6.28 18-73 
DE. Socks da Vos 98 37.89 | 38.18 | 13.65 1.38 14-78 














1 The measures are based on selected population samples and must be inter- 
preted with the limitations this brings about clearly borne in mind. 


The range of scores for all the tests is quite similar in many respects. 
The art majors’ and teachers’ scores started in the thirties and extended 
upward to within four points of each other in the nineties. The 
scores for the art minors and the non-art subjects started in the teens 
and extended to within five points of each other in the seventies. 
The range in scores of the non-art subjects was greater than that of 
the art minor group. 

The standard errors are almost the same for the art majors and 
teachers. The art minors have a much larger standard error than 
any of the other three groups. The standard deviation and the 
standard error indicate that the non-art group was less variable than 
any of the other three. 

A comparison of each group with all others is made in Table II 
to show the reliability of the difference between the means, and the 


percentage of each group which reached or exceeded the median of the 
others. 


From Table II it will be seen that the difference between the means 


is reliable in only two instances; namely, between the art majors and 
the non-art students and between the art teachers and the non-art 
group. The test, in so far as this sampling of students is representa- 
tive, does not make a reliable differentiation between those having 
various amounts of art training. The critical ratios between art 
majors and art teachers, and between art majors and minors, and 
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between art teachers and minors are slightly more than half as large ag 
necessary for statistical reliability. 


TaBLE IJ.—A Comparison or Art Major, Art TEacueR, ArT MINOR, Anp 
Non-art Sussects’ Scores ON THE KNAUBER ART Test SHOWING THE 
DIFFERENCE BETWEEN THE MEans, STANDARD ERROR, Critica Ratio, 

AND Per CENT OF OVERLAPPING - 








Subjects Means prs oid C./R. — 4 relies 
: hundred 
ae le) ee ou} as 
sce L ames | UE | 100] 1-90) o os 
Nomatt. || ge.ae | 24-97 | 8-86 | 7.20 | 100 yr 
pate | goes | 2088/90 | 2.08] OF ae 
Noman 77'] a3 ag | 25-52 | 8.85 | 6.65 | 100 re 
ee tae | 2 | | 1.08) 00 on 























One of the most helpful and practical means of comparing the dis- 
criminating power of a test is by percentage of overlapping, that is, to 
determine what percentage of one group reaches or exceeds the median 
of the other. From Table II it can be seen that 48.6 per cent of the art 
teachers reached or exceeded the median of the art majors. Twenty 
and nine-tenths per cent of the art minors reached or exceeded the 
median of the art teachers. The non-art students had only 3.36 per 
cent of their number reaching or exceeding the median score of the art 
teachers. Approximately the same percentage of art minors reach or 
exceed the median of the art majors and the art teachers. The per- 
centage of the non-art group reaching or exceeding the median of the 
art majors is only 3.07. 

The non-art group makes a better showing when compared with the 
art minors than when compared with the art majors. These data 
would seem to indicate that the Knauber Art Vocabulary test is making 
a sharp distinction between the special word knowledge possessed 
by art majors and teachers over against that possessed by students 
having no art beyond a general appreciation course. The test appears 
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to be much less effective in making a distinction between art minors 
and art majors or art teachers. It is beyond the scope of this study 
to determine to what extent this test could discriminate between 
students having a certain number of hours in art and possibly certain 
types of courses, but such a study should prove interesting and worth 
while. 


SUMMARY 


The Knauber Art Vocabulary test was given to one hundred fifty- 
eight college students and art teachers. The uncorrected reliability 
coefficient of the test, determined by the odds-even technique, was 
found to be .81 + .02. The corrected coefficient becomes .90 when 
the prophecy formula is applied. The reliability coefficient obtained 
by correlating the first and last quarters with the middle half was 
89 + .01. This coefficient became .94 after being corrected by the 
Spearman-Brown formula. 

The range of scores was about the same for art majors and art 
teachers, and for art minors and non-art students. The difference 
between the means was statistically reliable when the art-major 
group was compared with the non-art students. The difference 
between the means was not reliable when art majors were compared 
with art teachers, or art majors with art minors, or when art teachers 
were compared with art minors. | 
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BOOK REVIEWS 


Rusy Minor. Early Childhood Education. New York: D. Appleton- 
Century, 1937, pp. xix + 763. 


This book on nursery, kindergarten, and primary education igs 
intended to serve as a textbook or general reference work in teacher- 
training institutions. The keynote is the attempt to strike a balance 
between the claims of the activity or child-centered school on the one 
hand, and those of the traditional or subject-centered school on the 
other. The author aligns herself definitely with those who insist that 
the present needs and felt-interests of the child should form the basis of 
educational procedure. At the same time, however, she recognizes 
the danger of making activity an end in itself, and calls attention to 
the importance of the traditional school subjects in attaining the 
objectives of early childhood education. 

The first section of the book includes a brief history of preschool 
and kindergarten training and a discussion of the objectives and pro- 
gram of early childhood education. Twenty objectives are enumer- 
ated, stated, for the most part, in broad general terms and with 
considerable overlapping from one to another. Most readers will agree 
as to the desirability of these objectives, but the extent to which they 
are, or can be, attained at the kindergarten or primary level is a mat- 
ter of conjecture. The lack of data concerning the attainment of 
these objectives is one of the disappointing features of the book. This 
first section also deals with the nature of individual differences and 
methods of adapting to these differences. The value of tests in the 
recognition of individual differences is stressed, but a section on ‘‘ Avail- 
able Tests”’ lists only three of the many preschool and primary tests; 
the discussion even of these three is not especially praiseworthy. 
There is a brief treatment of personality and temperamental differ- 
ences in young children, characterized by an almost complete lack of 
mention of research in this field. 

The second section of the book is entitled ‘‘Some Aspects of 
Learning.”’ The author accepts as a definition of learning ‘the 
acquisition of new responses or the modification of old ones.”’ The 
réles of interest, attention, emotion, memorization, and drill are dis- 
cussed. Emphasis is placed on learning through purposeful activities 
and on the importance of experience as a basis for growth. The educa- 
tional philosophy expressed is that of Dewey and Kilpatrick, whose 
writings are quoted extensively. 

636 
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“Organization of Experience with Reference to a Classification of 
Subject-Matter” comprises the content of the third and largest section 
of the book. The treatment of subject-matter in accordance with the 
tenets of the activity school is developed at considerable length and in 
great detail. The general pattern includes the presentation of lists of 
objectives, or desirable outcomes, of each subject, followed by a series 
of suggested experiences or activities through participation in which 
the child is to achieve these outcomes. The efficacy of the proposed 
procedures, however, remains an open question; certainly such scant 
evidence as is presented in this volume is far from conclusive proof 
that the specified objectives are being attained to any measurable 
extent. Presumably, the author’s long experience in primary-school 
work has enabled her to appraise critically the procedures and activi- 
ties which she proposes, but the objective data on which such appraisals 
could be based are conspicuously absent from this volume. 

The personal and professional qualifications of the teacher, records 
and reports which should be kept, and equipment and supplies neces- 
sary for the kindergarten and primary classroom are treated briefly in 
the final section of the book. Extensive bibliographies and lists of 
topics for investigation and discussion are given at the end of each 


chapter. Rocer T. LENNON. 
Bronx, N. Y. 


J.M. Remnnarpt. Social Psychology. Philadelphia: J. B. Lippincott 
Co., 1938, pp. rx + 467. 


The classification of a text in social psychology as sociological is 
usually but a brief way of stating that it is environmentalistic in bias, 
that its treatment in general tends toward case study rather than the 
statistical, and that it neglects topics of great interest to the psycholo- 
gist while it features other sections which do not generally concern 
him. In all these respects Reinhardt’s text is sociological. It is 
dedicated to L. L. Bernard and acknowledges the influence of (but 
reflects only very slightly the theories of) Lewin and the late William 
Stern. Unlike the recent text of Freeman and especially that of J. F. 
Brown it touches upon political philosophy quite indirectly; yet it 
does discuss in some detail the social psychology of economic, social 
and educational maladjustments. The book has some documentation 
although the references at the end of each chapter are more often than 
not unmentioned in the discussions. 
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After a brief introductory section, Social Psychology gets under way 
with a chapter on “biological inheritance and human behavior traits” 
in which the philosophies of Pearl and the more environmentally- 
minded biologists are followed. In Chapter 3 the author discusses in 
considerable detail the supposed effects of food and oxygen hunger, 
qualitative dietary deficiencies, excess or deficiency of sunlight, 
glandular inbalance, and other heterogeneous factors. That so much 
attention is paid to these data which have been gathered in the main 
either from anecdotes or the infrahuman animals is surprising, espe- 
cially when one reflects that the author almost totally neglects many 
vitally important topics, e.g., that of the measured attitude. Rein- 
hardt has certainly forfeited the right, allegedly held by all sociologists, 
to grumble about the rat-centered psychologist. 

The chapter on the ‘‘ psycho-social significance of the nervous sys- 
tem”’ follows Herrick rather generally in the neurological discussions 
and Myerson in its interpretations. This is followed by two well- 
written but, to a psychologist, rather biased and incomplete chapters 
on twin studies. A large percentage of one chapter is devoted to the 
handwriting of twins! But the reviewer’s major objection here is to 
the old-fashioned additive formula presented in the treatment of 
nature and nurture. It must be admitted, however, that throughout 
the book this equation is largely ignored; moreover, in several places 
the author clearly implies that either of these variables is meaningless 


- if not defined in terms of the other. 


The discussions of ‘‘divergent social norms and adjustment pat- 
terns,” “‘culture and personality,” ‘‘ personality unity and the world of 
value” and ‘‘concepts of race and personality” are brief and to the 
point. These are followed by a section on ‘‘objective measurements” 
which is devoted wholly to data on social differences. These latter, 
of course, are considered to be wholly cultural in origin. Similar 
conclusions are reached for class and occupational differences (three 
chapters). The book ends with sections on ‘‘ personality and insecur- 
ity”’ and ‘‘the paradox of human nature.” 

Reinhardt’s book furnishes another example of the fact that 
although the field of social psychology is gradually coming to possess 4 
recognized field of its own, the delimiting still remains largely in the 
hands of the textbook writer. The topics of this text are not at all 
similar to those treated by Dunlap or J. F. Brown, and overlap only 
slightly those appearing in Myerson’s tome. Thus, especially as 
Reinhardt’s book is interestingly written, it could well be used along 
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with a more politico-philosophical, psychological or psychiatric 
treatise to aid in covering a broad field of study. 


Paut R. FARNSWORTH. 
Stanford University. 


M. A. StanGER and E. K. Dononug. Prediction and Prevention of 
Reading Difficulties. New York: Oxford University Press, 1937, 
pp. x + 191. 


This book deals with diagnostic and remedial procedures for the 
non-reader of normal intelligence. The major emphasis is on early 
diagnosis and prevention of reading disability rather than later diagno- 
sis and retraining. The entire treatment of the subject is in line with 
Orton’s cerebral dominance theory. The material is presented from 
the teacher’s point of view. 

Part I of the book is a simplified exposition of Orton’s cerebral 
dominance theory as related to language, particularly to reading 
deficiency. This theory is stressed in relation to diagnostic work. 
The theory is stressed in relation to diagnostic work. The theory is a 
stimulating approach to the problem of reading disability, although 
it should be remembered that it is only one approach to the problem. 
Other theories and techniques than those presented in this book are 
valuable, particularly in working with older children and adults. 

Part II is concerned with diagnostic procedures and measuring 
devices. The test procedures and methods of interpretation are 
closely and adequately presented. The tests are simple and of such a 
nature that they can easily be made and used by the teacher. They 
are designed primarily to discover confusion in lateral dominance. 

Part III is a well-organized set of simple and practical teaching 
procedures for remedial work. Training in left-to-right progression in 
word perception is stressed. .The method is an effective combination 
of visual, auditory, and motor training. Marked success by its use is 
reported by the authors. The procedures are best adapted to work 
with individuals or very small groups. 

Even though many are not ready to accept the cerebral dominance 
theory, the diagnostic and remedial procedures as outlined by the 
authors are of practical value. The authors’ emphasis upon early 
detection and prevention of reading deficiency is commendable. The 
book will be welcomed by teachers who are concerned with reading 
disability. Additional techniques and materials will be needed, how- 
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ever, for work with the many children who are less seriously handi- 


capped in reading. Roy B. Hackman. 
University of Minnesota. 


Wiuuiam Henry Gray. Psychology of Elementary School Subjects, 
New York: Prentice-Hall, Inc., 1938, pp. xm + 459. 


This book, in spite of its title, is written for teachers or education 
students, and not for psychologists. As an undergraduate textbook 
it will serve the admirable purpose of introducing the student to 
selected experiments on learning and disability problems in the 
elementary-school subjects. 

A chapter is devoted to each of the following subjects: reading, 
handwriting, arithmetic, spelling, language, social studies, art (music 
and drawing), and physical and health education. For each of these 
an effort is made to present its historical background as a school 
subject, ‘‘followed by a discussion of problems in learning, methods of 
diagnosing and overcoming students’ difficulties, and methods of 
measuring achievement.”’ An analysis of the book’s contents shows 
that about one third is devoted to non-psychological questions of 
history, content, aims, and teaching methods. The other two thirds 
include about twelve per cent on measuring achievement, twelve per 
cent on diagnosis and correction of disabilities, and the remaining 
forty-two per cent on the presentation of experimental work of both 
psychological and educational nature. Similar analysis of each 
chapter shows wide variation in the amount of space devoted to 
psychological and non-psychological topics. For example, spelling 
has only thirty-four per cent of its material psychological (judged on 
a very generous criterion), while handwriting has ninety-two per cent 
so classed. One wonders if the variations reflect the relative amounts 
of experimental work in the various subjects or the author’s selection 
of material. 

Evaluation of this book is difficult. As a textbook for undergradu- 
ate students it would appear to be satisfactory. Asa text for graduate j 
students in either education or psychology, it is too elementary and j 
too incomplete to be of much value. C. M. Lovurrtir. 

Indiana University. 
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